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ABSTRACT 


Although  rating  scales  of  varied  fonns  have  been  widely  used  to 
estimate  and  evaluate  handling  qualities  over  the  past  decade,  a  number 
of  deficiencies  in  both  method  and  data  base  have  been  apparent.  This 
investigation  was  aimed  at  overcoming  many  of  these  deficiencies  by 
attempting  to  resolve  the  difficulties  experienced  with  rating  scales 
themselves,  and  by  extending  and  adding  to  already  existing  relationships 
between  ratings  and  pilot/vehicle  system  parameters. 

Rating  scales  have  come  under  increasing  criticism  for  problems 
related  to  wording  ambiguity,  the  dual  mission  character  of  same  scales, 
the  nonuniformity  in  the  distribution  of  descriptors  across  the  scale, 
and  the  misuse  of  scales  which  has  occurred  when  ratings  have  been 
averaged.  Psychometric  methods  provide  an  approach  to  these  problems, 
and  in  this  study  were  used  to  scale  several  phrases  descriptive  of 
vehicle  handling  qualities.  Thus,  quantitative  characteristics  were 
derived  for  contemporary  scales  through  the  use  of  a  scaling  technique 
known  as  the  "Method  of  Successive  Intervals,"  where  data  for  the  method 
were  obtained  from  a  survey  experiment. 

An  experiment  was  conducted  which  added  to  available  data  relating 
Cooper  ratings  and  pilot /vehicle  parameters,  and  which  also  tested  some 
potential,  alternate  scale  candidates.  The  correlation  results  indicate 
that  ratings  are  probably  based  on  performance  and  the  degree  of  diffi¬ 
culty  experienced  in  maintaining  the  performance.  The  difficulty  is 
most  easily  represented  by  the  pilot  equalization  required  and  the 
vehicle  stick  characteristics . 
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SECTION  I 


INTRODUCTION 


The  suitability  of  a  manually  controlled  vehicle  to  serve  its 
intended  purpose  is  ultimately  assessed  by  a  series  of  judgments. 

Perhaps  the  most  difficult  portion  of  such  an  assessment  is  the  evalua¬ 
tion  of  the  vehicle's  handling  qualities,  which  play  such  a  key  role  in 
the  overall  suitability  of  the  vehicle,  and  yet  have  in  the  past  been 
perplexing  even  to  define  satisfactorily.  Cooper  ( 1 )  originally  proposed 
a  handling  qualities  rating  scale  which  found  wide  acceptance.  Sub¬ 
sequently,  modifications  and  variations  were  proposed  and  used  in 
special  applications  [for  example,  Harper  (2)].  As  experience  with 
rating  scales  accumulated,  the  amount  of  information  desired  from 
them  also  increased,  resulting  in  inconsistencies  and  confusion  from 
the  interpretation  and  use  of  the  ratings.  The  problem  was  further 
compounded  when  the  engineer,  who  was  charged  with  producing  a  suitable 
vehicle,  faced  the  task  of  extrapolating  the  rating  data  to  increasingly 
complex  vehicle  systems. 

The  purpose  of  this  study  is  to  attempt  to  overcome  seme  of  the  rating 
scale  difficulties  encountered  in  the  decade  of  experience  with  the  scales, 
to  structure  the  evaluation  problem  in  terms  that  can  be  applied  to  future 
pilot/vehicle  systems,  and  to  extend  our  knowledge  of  the  causal  factors 
of  pilot  ratings,  i.e.,  the  relationship  between  ratings  and  pilot/vehicle 
system  parameters. 

The  study  naturally  divides  itself  into  two  parts.  Many  of  the 
problems  with  contemporary  scales  are  independent  of  a  specific  rating 
situation,  and  are  related  to  the  semantics,  definitions,  and  structure 
of  the  scale  itself.  These  problems  are  investigated  in  Section  II, 
where  it  is  attempted  to  clearly  define  handling  qualities,  and  in 
Section  III,  where  psychological  measurement  techniques  sure  used  to 
evaluate  the  utility  of  rating  scales  in  general,  and  to  obtain  numerical 
data  for  specific  scale  terminology. 

The  second  part  of  this  study  is  concerned  with  the  search  for  the 
physical  causes  of  a  pilot's  opinion  of  a  vehicle.  Section  IV  describes 
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a  simulation  experiment  in  compensatory  tracking,  where  ratings  were 
taken  at  the  same  time  that  parameters  of  interest  were  measured. 
Section  V  presents  and  discusses  the  results  of  the  experiment,  and 
Section  VI  reiterates  the  major  findings  and  conclusions  of  the  study, 
and  makes  several  recommendations  regarding  the  use  and  future  of 
pilot  ratings. 
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SECTION  II 


RATING  SCALE  BACKGROUND  AND  TASK  ZXXMENT8  DEFINITION 

In  the  process  of  measuring  and  evaluating  pilot/vehicle  performance, 
it  is  necessary,  as  one  facet  of  the  investigation,  to  measure  operator 
opinion.  These  subjective  measures  are  in  fact  the  ultimate  evaluation 
of  the  system  and  consequently  are  foremost  in  the  designer's  mind 
throughout  vehicle  development.  Unfortunately,  the  current  connections 
between  pilot  ratings,  pilot  behavior,  and  vehicle  characteristics  are, 
at  best,  highly  qualitative.  This  situation  has  not  improved  as  vehicles 
and  associated  pilot/vehicle  handling  qualities  considerations  have 
steadily  increased  in  complexity,  for  then  the  difficulties  with  existing 
rating  scales  and  subjective  measures  become  still  more  obscure.  As 
introductory  background  to  existing  rating  scales,  the  difficulties 
providing  much  of  the  motivation  for  the  current  work  will  be  outlined. 

A.  DIFFICULTIES  WITH  EXISTING  RATING  SCAUES 

Several  scales  for  use  in  handling  quality  ratings  exist,  the  most 
recent  and  widely  used  containing  ten  probably  unequal  divisions.  Primacy 
among  these  can  be  claimed  by  a  scheme  proposed  by  Cooper  (  1  )  and 
extensively  used  by  the  NACA  and  NASA.  The  scale  is  shown  in  Table  I. 

In  spite  of  its  ten  subdivisions,  it  is  probably  fair  to  say  that  the 
Cooper  scale  deliberately  emphasizes  three  handling  qualities  categories. 
The  category  boundaries  are  between  satisfactory  for  normal  operation  and 
acceptable  for  emergency  operation  (a  numerical  2*5  )>  and  between  the 
emergency  operation  category  and  unacceptable  (a  numerical  6.5).  Cornell 
Aeronautical.  Laboratory  (Harper,  2)  has  evolved  a  scale  primarily 
for  use  with  the  many  configurations  possible  with  variable- stability 
aircraft.  This  scale  is  shown  in  Table  II.  Their  scale  is  not  intended 
to  emphasize  any  particular  levels.  Others  have  used  variants  of  these 
two  scales,  modified  to  emphasize  particular  types  of  flying  operations 
such  as  tracking  tasks . 

The  two  scales  of  Cooper  and  CAL  are  not  directly  comparable  point 
by  point.  However,  the  opinion  has  been  ventured  that  they  are  probably 
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TABLE  I 


THE  ORIGINAL  COOPER  SCALE  (  1  ) 


COOPER 

PR 

DESCRIPTION 

ADJECTIVE 

RATING 

MISSION 

PRIMARY 

MISSION 

ACCOMPLISHED? 

CAN  BE 
LANDED? 

1 

Excellent, 
includes  optimum 

Satisfactory 

Normal 

operation 

Yes 

Yes 

1 

Good, 

pleasant  to  fly 

Yes 

Yes 

2 

Satisfactory,  but  with 
some  mildly  unpleasant 
characteristics 

Yes 

Yes 

3 

Acceptable,  but 
with  unpleasant 
characteristics 

Unsatisfactory 

Emergency 

operation 

Yes 

Yes 

1 

Unacceptable 
for  normal  operation 

Doubtful 

Yes 

5 

Acceptable  for  emer¬ 
gency  operation  (stab, 
aug.  failure)  only 

Doubtful 

Yes 

6 

Unacceptable  even  for 
emergency  condition 
(stab.  aug.  failure) 

Unacceptable 

No 

operation 

No 

Doubtful 

1 

Unacceptable  — 
dangerous 

No 

No 

8 

Unacceptable  - 
uncontrollable 

No 

No 

9 

Did  not  get 
back  to  report 

Unprintable 

What 

mission? 

10 

TABLE  II 


THE  CORNELL  AERONAUTICAL  LABORATORY  SCALE  (HARPER,  2) 


MISSION  SUITABILITY  (CAL'S  "CATEGORY") 

FLYING  QUALITIES 

AIRCRAFT  ACCEPTABILITY 

SATISFACTORY 

Criterion:  Mission 

ACCEPTABLE 

performance  is  not 
seriously  affected 
by  any  flying 
quality  deficien¬ 
cies  which  may  be 
present 

Definition: 

"Seriously  affec¬ 
ted"  =  pilot  would 
ask  that  the  defi¬ 
cient  characteris¬ 
tics  be  improved 

UNSATISFACTORY 

Criterion:  Mission 
performance  is 
sufficiently 
affected  by  flying 
quality  deficien¬ 
cies  that  pilot 
asks  that  charac¬ 
teristics  be  fixed 

"RELUCTANTLY" 

ACCEPTABLE 

Criterion:  Mission 
performance  deficien¬ 
cies  cannot  be 
improved  without  a 
serious  compromise  of 
the  other  factors 
which  influence  the 
mission  capability  of 
the  airplane 

UNFLY  ABLE 


Requires  major 
portion  of 
pilot's  atten¬ 
tion 


Controllable 
only  with  a 
minimum  of 
cockpit  duties 


Aircraft  just 
controllable 
with  complete 
attention 


Control  will  be 
lost  sometime 
during  mission 


Un  fly  able 


most  similar  at  about  the  3*5  level  (see  Section  V  )  and  obviously  much 
parallelism  exists.  From  a  detailed  examination  and  consideration  of 
the  scales,  it  is  plain  that  difficulties,  if  not  deficiencies,  are 
inherent  in  both.  Some  of  these  are  listed  below: 


General:  1  .  The  scales  are  ordinal,  and  of  such  nature  as  to 
have  practically  no  chance  of  having  equal  inter¬ 
vals  on  some  hypothetical  underlying  interval 
scale. 

2.  Ine  definitions  of  qualities,  tasks,  and  rating 
descriptors  are  sometimes  vague. 

3*  As  performance  measures,  ratings  are  incomplete. 
They  usually  are  not  directly  connected  with 
specific  measurable  parameters,  so  comments  and 
detailed  analyses  are  usually  needed  to  discover 
underlying  reasons  for  a  given  rating. 


Cornell:  1  .  Very  poor  category  delineation  (e.g.,  "Unsatis¬ 
factory"  flying  qualities  seem  to  be  properties 
of  a  "Reluctantly  Acceptable"  aircraft;  there 
are  apparently  no  flying  quality  characteristics 
below  "Unsatisfactory,"  etc.). 

2.  Double-duty  adjective  descriptors  (e.g.,  bad  and 
fair) . 

3-  Incompatible  adjectives,  i.e.,  degrees  of  "good¬ 
ness"  (excellent,  good,  fair,  poor,  bad,  very  bad) 
mixed  with  degrees  of  "safety"  (dangerous)  and 
degrees  of  "controllability"  (unflyable). 


Cooper : 


1  .  Mixes  tasks  (normal  and  emergency  conditions). 

2.  Mixes  mission  phases  (whatever  phases  are  involved 
in  the  tests  and  some  hypothetical  landing  opera¬ 
tion)  . 

3*  Confuting  nomenclature  (e.g.,  "Unsatisfactory"  is 
satislcotory  for  emergency  operation) . 

4.  Incompatible  adjective  descriptors. 


Recently,  Cooper  and  Harper  (  3  )  took  into  account  some  of  the 
deficiencies  of  existing  scales  and  published  a  revised  scale  (see 
Table  III) .  Seme  experience  has  been  gained  with  the  new  scale,  and 
it  appears  that  some  of  the  difficulties  may  have  been  resolved.  For 
example,  the  revised  Cooper-Harper  scale  has  vastly  improved  the  com¬ 
patibility  of  adjective  descriptors.  However,  the  scale  is  still  ordinal 
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THE  REVISED  COOPER-HARPER  SCALE  (  3) 


and  the  question  of  its  quantitative  character  is  as  unanswered  as  with 
the  previous  scales. 

Difficulties  experienced  with  the  use  of  scales  can  be  divided  into 
four  convenient  categories  which  include  the  problems  just  mentioned: 

1  •  Extrapolation  of  the  simulated  taak  to  the  real  flight 
situation.  The  necessity  of  using  simple  simulations 
gives  rise  to  the  problem  of  extrapolating  the  simula¬ 
tion  to  the  actual  flight  situation.  Interpretation 
of  the  display,  and  agreement  between  the  experimenter 
and  pilot  on  the  objectives  of  the  evaluation,  are  the 
important  factors  here . 

2.  The  alternate  mission  character*  Some  scales  allow  for 
a  change  in  mission  should  the  pilot  be  unable  to  carry 
out  the  primary  mission  (landing  the  aircraft  in  the 
event  of  stability  augment er  failure  is  on  example) . 

This  is  perhaps  a  tenable  concept  for  actual  flight 
testing,  but  becomes  increasingly  difficult  to  struc¬ 
ture  as  the  sim\J.ation  is  simplified. 

3  •  Verbal  descriptions  und  phrases  •  Incomplete  and 
ambiguous  scale  category  descriptors  result  in  an 
undesirable  arbitrariness  in  the  calibration  between 
real  and  simulated  flight,  thereby  causing  evaluation 
to  be  nearly  a  "black  art"  and  lacking  in  good  repeat¬ 
ability  and  consistency  across  the  subject  population. 

k,  A  scale's  qualitative  character.  Data  relating  sub¬ 
jective  measures  with  vehicle  and  operator  parameters 
are  far  from  complete.  Additionally,  past  experience 
has  shown  that  the  pilot  is  occasionally  unable  to 
articulate  the  primary  causes  of  his  discontent.  It 
is  not  surprising,  then,  that  existing  scales  do  not 
solicit  opinion  expressed  in  terms  of  the  quantities 
to  which  the  operator  is  sensitive. 


The  difficulties  of  items  1  and  2  can  be  at  least  partially  alleviated 
by  carefully  defining  the  conditions  under  which  ratings  are  taken.  Dis¬ 
cussions  follow  in  Section  II. B  which  delineate  those  areas  requiring 
special  attention  from  the  experimenter.  Various  alternatives  to  the 
language  problem  noted  in  item  3  will  be  discussed  in  Section  II. C. 

Section  III  will  then  explore  the  possibility  of  a  quantitative  scale 
underlying  the  contemporary  scales.  We  will  then  be  in  a  position  to 
investigate  the  connections  between  pilot  ratings  and  system  measures,  as 
noted  in  item  k.  This  will  be  carried  out  in  Sections  IV  and  V. 
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B.  CURXFICATION  AND  REFINEMENT  07  TASK 


An  adequate  delineation  of  the  types  of  assessments  (and  therefore 
the  corresponding  tasks  or  subtasks)  that  an  operator  will  be  required 
to  make  is  a  necessary  preliminary  to  any  evaluation  problem  using  pilot 
opinion  as  a  tool.  It  is  not  surprising  that  some  pilots  have  been 
unable  to  rate  a  simulated  configuration  simply  because  of  the  inade¬ 
quacy  of  instructions  and  statement  of  purposes.  If  a  scheme  of  evalua¬ 
tion  is  to  be  universally  useful,  we  must  improve  our  understanding  of 
the  task  situation  and  our  ability  to  define  it. 

1 .  Mission  and  Task  Elements 

A  "mission"  is  the  composite  of  pilot/vehicle  functions  that  must  be 
performed  to  fulfill  operational  requirements.  The  pilot/vehicle  func¬ 
tions,  or  mission  elements,  are  properly  called  "tasks,"  and  are  defined 
by  specifying  (a)  the  control  activities  required,  (b)  the  environment 
affecting  the  control  situation  (e.g.,  random  disturbance  levels),  and 
(c)  the  performance  specifications  for  the  pilot/vehicle  system.  (Note 
that  by  these  definitions,  the  task  is  redefined  when,  for  example,  the 
disturbance  level  is  changed;  thus  a  mission  could  have  several  parallel 
task  alternatives  which  are  dependent  on  environmental  conditions.)  These 
"task  elements"  will  be  discussed  briefly  below. 

A.  Control.  When  an  aircraft  is  flown  manually  the  pilot  is  concerned 
ehiefly  either  with  maintaining  the  aircraft  in  a  steady  condition  of 
flight  or  with  changing  the  aircraft  freu  one  steady  condition  to  another. 
Control  is  the  means  to  accomplish  these  ends  and  is  defined  in  the  Handbook 
of  Astronautieal  Engineering  (5O),  Sect.  27.5,  p.  35,  as: 

"The  development,  and  application  to  a  vehicle,  of 
appropriate  forces  which  (l)  establish  some  operating 
equilibrium  state  of  vehicle  motion  ( operating-point 
control),  and  (2)  restore  a  disturbed  vehicle  to  its 
equilibrium  state  and/or  regulate,  within  desired 
limits,  its  departure  from  operating-point  conditions 
( stabilization) ." 

Control  implies  the  imposition  of  commands  upon  the  system  and  the 
suppression  of  the  effects  of  disturbances.  Disturbance  suppression 
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is  conventional  closed- loop  regulation  when  the  pilot  is  active.  Also, 
some  disturbance  suppression  capacity  is  inherent  in  the  craft  even  when 
it  is  operating  unattended.  Thus  both  closed-  and  open- loop  pilot/vehicle 
systems  are  involved  in  suppressing  the  effect  of  disturbances  on  the  air¬ 
craft.  Pilot  inputs  to  the  craft  may  be  pure  commands,  which  are  functions 
of  time  alone,  or  may  depend  on  some  vehicle  deviation  from  a  desired  state 
of  motion;  so  command  operations  are  also  both  open-  and  closed-loop  in 
nature.  Therefore  control  activities  in  piloted  flight  have  four  aspects: 

•  Command  maneuvers,  open-loop 

•  Command  maneuvers,  closed- loop 

•  Regulation 

•  Unattended  operation  (open- loop  regulation) 

Closed-loop  features  are  dominant  in  the  first  three  aspects;  explicitly 
for  the  middle  two;  and  implicitly  for  open-loop  command  maneuvers  because 
these  end  in  closed- loop  operations  unless  the  maneuvers  are  flawlessly  per¬ 
formed.  Although  the  open- loop  characteristics  can  have  a  large  influence 
on  pilot  workload,  ratings  tend  to  depend  on  the  closed-loop  control  charac¬ 
teristics  because  most  deficiencies  will  appear  only  under  the  difficult 
and  demanding  higher  gain  conditions.  Thus,  handling  qualities  studies  have 
historically  concentrated  on  closed-loop  tracking  as  the  primary  evaluation 
task  and  will  probably  continue  to  do  so  for  some  time  to  came. 

b.  System  input.  Environmental  factors  influencing  the  pilot/vehicle 
system  characteristics  and/or  response  must  be  specified  to  completely 
define  the  mission.  These  factors  are  most  commonly  of  a  system  input 
nature  (either  disturbance  or  command)  and  since  the  mission  is  comprised 
of  tasks,  system  inputs  will  be  included  in  the  task  specification  also. 

This  breakdown  is  somewhat  arbitrary,  but  useful.  An  input  catalog  can  be 
constructed  to  show  typical  command  and  disturbance  inputs  [gust  and  terrain 
inputs,  ILS  spectra,  precision  approach  radar  noise,  etc.,  e.g..  Ref.  4]  so  that 
the  task  may  be  defined  in  terms  of  the  input  to  a  high  degree  of  accuracy. 

c.  Performance  specifications.  To  complete  the  task  definition, 
performance  specifications  must  be  stated.  It  is  here  that  mission  effects 
become  apparent.  With  the  definition  of  "mission"  as  ’’required  operations," 
and  with  a  catalog  of  generic  tasks,  mission  effects  become  a  matter  of 
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aegree  rather  than  of  kind,  e.g.,  scaling  of  amplitudes,  response  times, 
regulation  accuracy,  time  duration  of  task,  etc.  For  example,  Table  IV 
summarizes  many  of  the  common  flying  tasks  of  a  command  maneuver  nature 
and  could  be  considered  the  beginning  of  a  generic  task  catalog.  The 
variables  are  shown  in  Fig.  1 . 

The  inclusion  of  environmental  and  performance  specifications  will 
enable  us  to  avoid  the  embarrassing  conflict  that  apparently  exists 
when  two  vehicles  of  similar  kind  are  given  widely  different  pilot  ratings. 
Thus  a  stability-augmented  hovering  helicopter  is  often  rated  "poor"  in 
gusty  air,  while  a  lunar  landing  vehicle,  which  has  essentially  the  same 
dynamics,  is  rated  "good."  The  difference  is  obviously  the  disturbance 
input.  Although  the  state-of-the-art  is  not  advanced  enough  at  this  time, 
sufficient  data  will  no  doubt  exist  sometime  in  the  future  to  enable  an 
analytical  tie  to  be  established  between  pilot  ratings  for  two  different 
tasks,  where  the  vehicle  dynamics  are  the  same  and  the  task  differences 
are  entirely  due  to  input  level  and  performance  requirements.  Our  ability 
to  find  the  tie,  however,  will  depend  heavily  on  record  keeping  and  instruc¬ 
tions  to  the  rater.  The  rater  must  evaluate  in  the  context  of  the  mission. 

With  the  mission  phase  or  task  completely  specified,  the  designer  is 
in  a  position  to  solicit  an  evaluation  of  a  specific  vehicle,  or  a  com¬ 
parison  between  vehicles.  Note  that  without  a  complete  definition,  only 
a  comparison  can  be  made,  and  it  will  be  based  on  some  nonspecified  per¬ 
formance  characteristics  which  ( 1 )  may  preclude  close  agreement  between 
evaluators,  (2)  does  not  really  help  the  designer  in  determining  the 
suitability  of  the  vehicle  to  perform  its  reason  for  being,  and  (3)  makes 
it  impossible  to  pass  along  any  useful  information  to  other  experimenters. 

2,  Bases  for  Rating — Handling  Qualities 

With  the  approach  to  task  definition  established,  the  general  factors 
influencing  pilot  opinion  of  a  given  task  can  be  discussed.  The  purpose 
here  is  to  indicate  some  classifications  of  these  factors  which  will  be 
helpful  in  establishing  better  communication  between  the  engineer  and 
test  pilot. 
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Handling  qualities  may  be  defined  as  those  characteristics  which 
determine  the  control  nature  and  behavior  of  pilot/vehicle  systems. 

In  this  context  a  handling  quality  is,  therefore,  any  property  of  the 
pilot/vehicle  system  which  relates  to  open-  or  closed-loop  command  or 
regulation.  Handling  qualities  thus  include  any  properties  or  attributes 
of  the  vehicle  and  the  pilot  as  they  interact,  either  actively  or  passively, 
in  the  pilot/vehicle  system.  Vehicle  characteristics  associated  with 

•  changing  the  equilibrium  flight  condition 

•  controlled  maintenance  of  equilibrium 

•  unattended  maintenance  of  equilibrium 

•  changes  in  behavior  of  the  equilibrium 

are  clearly  such  properties.  Pilot  behavior  characteristics  necessary 
for  control  are  also  handling  qualities.  These  include 

•  open-loop  command  insertion 

•  kinds  of  control  loops  closed  (airframe  motion 
quantities  sensed  by  the  pilot) 

•  the  type  of  control  effort  required  within  each 
control  loop  (e.g.,  the  necessary  pilot  equali¬ 
zation,  as  discussed  in  Section  V.A)  to  achieve 
crossovers  compatible  with  adequate  pilot/vehicle 
system  stability  and  response 

Less  direct  pilot- connected  handling  qualities  are  the  attention  and  skill 
(i.e.,  training  and  experience)  levels  needed  to  generate  the  pilot 
behavior  qualities  listed  above. 

Properties  of  the  pilot/vehicle  system  as  an  entity  are  a  third  kind 
of  handling  quality  factor.  Examples  would  include  closed- loop  charac¬ 
teristics  such  as 

•  bandwidths  (loop  closures  or  crossover  frequencies) 
of  control  loops  closed 

•  average  system  performance,  such  as  rms  errors,  in  the 
presence  of  representative  commands  or  disturbances 

The  total  pilot/vehicle  system  characteristics,  as  a  class,  reflect  only 
those  pilot  and  vehicle  dynamic  interactions  which  cannot  be  expressed 
Just  as  well  by  either  pilot  or  vehicle  characteristics.  This  category 
is  especially  sensitive  to  the  external  environment  as  the  source  of 
disturbances. 


A  comprehensive  list  of  handling  qualities  could  be  developed  by 
extending  the  above  vehicle,  pilot,  and  pilot/vehicle  system  charac¬ 
teristics.  In  such  a  list,  however,  the  vehicle  and  pilot  properties 
are  not  as  obviously  interconnected  as,  in  fact,  they  are  forced  to  be 
by  pilot  adaptability.  An  alternative  scheme  is  to  generalize  on  those 
attributes  possessed  by  the  vehicle  for  which  corresponding,  or  associ¬ 
ated,  pilot  capabilities  exist.  Such  a  classification  is  given  in 
Table  V,  where  pilot  and  vehicle  properties  are  expressed  in  terms 
of  the  dynamic  parameters  of  manual  vehicle  control  ( see  Section  V 
for  a  more  thorough  discussion  of  the  parameters). 

3.  Abstraction  of  Tasks 

With  an  understanding  of  what  handling  qualities  are,  the  abstraction 
of  real  tasks  to  simplified  simulations  can  be  made  using  the  criterion 
that  qualities  of  Table  V  be  observable  in  the  simplified  abstraction. 

Table  VI  shows  some  idealized  vehicle  configurations  for  the  simple  con¬ 
trolled  elements  used  in  the  McRuer,  et  al,  (5)  investigation  amd  for  which 
significant  data  exist.  Combinations  of  the  simplified  characteristics 
axe  appropriate  for  general,  flying  tasks  involving  closed-loop  and  some 
open-loop  control.  In  particular,  they  are  reasonable  idealized  handling 
qualities  of  the  "maneuverability,,  and  "command- ability"  nature.  The 
longitudinal  cases  involving  short  period  dynamics  may  require  the  addition 
of  a  stiffening  term  to  approach  an  idealized  situation  for  "trimmability." 

The  idealized  configurations  of  Table  VI  axe  quite  compatible  with 
simplified  displays.  In  fact,  when  a  rating  scale  is  accompanied  by  a 
task  definition,  the  key  factor  regarding  the  display  is  that  the  experi¬ 
menter  and  subject  agree  on  interpretation,  since  the  display  abstraction 
becomes  essentially  a  "mission  effect."  The  experimenter  must  be  explicit 
about  his  objectives.  He  may  very  well  wish  to  evaluate  multiple  handling 
qualities,  e.g.,  "controllability"  (a  function  of  the  controlled  element), 

" trackability"  (a  function  primarily  of  the  system  input),  and  open-loop 
chfiracteristics .  Making  such  an  evaluation  could  very  well  require  two  or 
three  ratings  from  a  subject,  and  a  suitable  scale  would  of  course  need  to 
exhibit  a  flexibility  capable  of  handling  such  requirements. 
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SOME  HANDLING  QUALITIES 


TABUS  VI 


IDEALIZED  vehicus  configurations 


CORRESPONDING  IDEALIZED  VEHICUS 

Longitudinal 

Lateral 

Kc 

fijp  ••  5e 

c  for  very  large 

maneuver  margin 

a— Be 

e  for  very  large  direc- 

r  tional  stability 

a  in  steers-like-a-car 

'Pc  a  control  mode 

Kc/e 

approximation  for 
flc  — 6e  ideal  <%p.  £ep» 

1/Tflg 

_  approximation  for 

Ideal  1/Tr 

K<./s2 

hc-*-Be  ,  formation  flight 

i  for  very  small 

0C  Be  ,  maneuver  margin 
and  large  l/T eg 

<pc  5a  for  small  1  /Tr 

C.  SCALE  LANGUAGE  ALTERNATIVES 

The  discussion  of  the  previous  section  (II-B)  essentially  defined 
the  problems  related  to  task,  mission,  and  simulation,  and  evolved  some 
alternate  ways  to  consider  handling  qualities.  The  language  used  to 
solicit  responses  from  subjects  will  be  discussed  here. 

As  noted  by  Cooper  and  Harper  (3),  the  pilot  evaluation  is 
intended  to  meet  two  objectives:  (l)  to  provide  an  overal  assessment 
of  the  suitability  of  the  vehicle  in  its  intended  use  (called  a  "global" 
rating  by  sane)  and  (2)  to  provide  information  pertaining  to  the  specific 
deficiencies  which  interfere  with  the  intended  use. 

The  first  objective  requires  that  the  rater  be  able  to  express  his 
subjective  impression  of  the  handling  qualities  of  the  vehicle  in 
performing  the  required  maneuvers.  This  "impression"  is  the  sum  toteul 
of  all  of  the  sundry  physical  factors  which  contribute  to  the  handling 
qualities  of  the  vehicle.  Since  there  is  no  common  physical  measure 
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which  integrates  all  of  the  factors,  a  scale  must  be,  in  part  at  least, 
in  subjective  terms. 

The  second  objective  requires  that  the  rater  be  able  to  provide 
information  on  specific  problem  areas  to  aid  the  experimenter  or 
designer  in  solving  the  problems.  Thus,  a  language  is  required  which 
is  valid  and  unambiguous  to  as  large  a  population  as  possible  to  minimize 
training  requirements  and  to  maximize  repeatability. 

What  handling  qualities  are  was  discussed  briefly  in  the  previous 
section — let  us  use  that  information  to  present  some  alternative 
language  possibilities  for  a  scale.  Table  VII  shows  various  handling 
quality  related  measures  and  parameters  grouped  arbitrarily  by  what  might 
be  called  "disciplines."  Thus,  reponses  could  be  solicited  from  pilots 
in  each  of  these  groups.  For  example,  raters  could  be  trained  in  the 
engineering  language  of  pilot  parameters  ( column  1 ) .  The  rater  would 


TABLE  VII 

HANDLING  QUALITY  RELATED  MEASURES  OR  PARAMETERS  IN  TERMS  0F:# 


PILOT 

VEHICLE 

SYSTEM 

FREQUENCY  DOMAIN 
(See  Table  V  for 
definitions) 

PERFORMANCE 

SUBJECTIVE 

(D 

(2) 

(3) 

(M 

(5) 

(6) 

KC 

«e 

Tr inability 

^/i2 

Sensitivity 

Si  >  <“i 

03b 

Maneuverability 

c^ 

Controllability 

Tl 

Te 

a 

Ti 

% 

«m 

tr 

over¬ 

shoot 

Open- loop 

C  ctmnandab  ili  ty 

F2 

Accident 

Rates 

Gunnery 

Scores 

Precision 

Effort 

Range  of  Te3ks 

(Attention) 

Safety 

°i 

Probable 

Comfort 

“i 

Error 

Trustworthiness 

Regulation 

See  List  of  Symbols  for  definitions. 
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then  be  telling  the  experimenter  what  it  was  about  his  own  responses 
that  he  disliked.  The  disadvantage  of  an  engineering,  or  scientific, 
language  is  the  high  level  of  training  required  for  selective  and 
repeatable  ratings.  Of  those  questioned,  unanimous  agreement  among 
pilots  and  almost  unanimous  agreement  among  engineers  was  obtained  on 
this  point.*  Similar  results  were  obtained  on  the  definitions  of 
handling  qualities  based  on  frequency  domain  (column  4)  characteristics. 
It  was  concluded  that  a  pilot  would  indeed  have  a  difficult  time 
remembering  and  interpreting  the  distinction  between  the  frequency 
domain  terms.  The  ability  to  assess  vehicle  and  system  parameters 
(columns  2  and  3)  depends  heavily  on  training  and  also  requires  a 
variety  of  maneuvers  to  be  performed.  Even  then  it  is  doubtful  that 
a  pilot  could  consistently  determine  the  state  of  sundry  frequency 
and  time  response  parameters.  Past  work  has  shown  that  performance 
is  very  often  not  correlated  with  the  pilot's  opinion  (Refs.  21,  26, 

52),  so  the  performance  measures  of  column  5  are  unlikely  to  yield 
useful  results,  even  if  the  pilot  could  estimate  them. 

Column  6  represents  an  attempt  to  define  subjective  piloting 
problems  or  problem  areas.  The  list  could  be  extended  indefinitely, 
but  in  Table  VII  they  have  been  arranged  in  what  is  felt  to  be  an 
order  of  decreasing  validity.  The  table  does  not  imply  that  safety, 
for  example,  is  unimportant,  only  that  a  rater  would  have  difficulty 
comparing  vehicles  based  on  the  ambiguous  quality  "safety." 

The  criteria  used  in  selecting  possibilities  for  a  scale  were  that: 

(1)  The  language  be  as  natural  r:  -d  unambiguous  to  the 
rater  as  possible  so  that  little  analysis  by  the 
pilot  is  required  during  the  rating  situation. 

(2)  The  language  be  as  descriptive  of  piloting  problems 
or  problem  areas  as  possible. 


"Informal  discussion  on  scale  language  possibilities  were  held  with 
several  persons,  including  six  STI  handling  qualities  engineers,  one  Air 
Force  test  pilot,  and  three  NASA  test  pilots. 
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Fran  the  discussion  above,  it  is  apparent  that  the  subjective  words  of 
column  6,  Table  VII,  are  most  likely  to  be  suitable.  By  "suitable,"  it 
is  meant  that  the  descriptors  should  be  unambiguous  semantically,  and 
universally  valid  in  the  rating  situation.  The  semantic  problem  can 
bo  test 'd  by  a  simple  survey  and  is  discussed  in  Section  III,  while  the 
vali  Lty  question  can  then  be  considered  through  actual  rating  experiment 
which  are  described  in  Section  IV. 

D.  SUMMfcKT 

The  conclusions  to  be  drawn  from  the  discussion  to  this  point  are 
that: 


•  The  experimenter /designer  should  draw  from  his 
catalog  of  common  maneuvers  to  construct  a  series 
of  tasks  representative  of  the  mission.  Similar 
tasks  can  be  grouped  so  that  the  differences  between 
them  become  scaling  problems . 

•  The  tasks  can  then  be  abstracted,  if  desired,  to 
simplified  control  situations  capable  of  being 
easily  simulated. 

•  The  pertinent  variables  of  the  evaluation  shoul  i  be 
set  down  in  writing.  These  will  include  the  task 
definition,  performance  requirements,  time  duration 
of  task,  interpretation  of  display,  disturbance,  and 
any  other  information  necessary  to  establish  agreement 
between  the  experimenter  and  the  pilot  on  the  purposes 
and  objectives  of  the  evaluation. 

•  A  scale  (or  scales)  is  most  likely  to  be  universally 
applicable  and  valid  if  constructed  frcm  subjective 
descriptions  of  handling  qualities. 


The  problem  of  quantizing  scale  descriptor  candidates  is  quite  complex; 
consequently,  the  entire  following  section  will  be  devoted  to  an 
application  of  psychophysical  measurement  techniques  to  rating  scales. 
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SECTION  III 


DETERMINATION  OF  THE  QUANTITATIVE  NATURE  07  RATING  SCALES 

A.  INTRODUCTION 

As  discussed  earlier  in  this  report,  a  major  objective  of  this  study 
is  to  evolve  a  rating  scale  which  has  some  underlying  functional  structure 
so  that  certain  mathematical  operations  may  be  performed  with  pilot  rating 
data.  Our  approach  to  the  problem  will  draw  heavily  on  the  methods  of 
psychometrics.  Briefly,  we  will  select  a  group  of  phrases  which  ere 
possible  candidates  for  a  rating  scale  language.  We  will  then  construct 
an  experiment  (in  the  form  of  a  survey)  to  gather  data  on  the  proposed 
phrases.  The  data  will  then  be  reduced  using  notions  and  techniques 
evolved  from  the  theory  and  methods  of  psychometrics.  Some  of  the  con¬ 
cepts  are  quite  involved;  hence  this  entire  section  will  be  devoted  to 
the  scaling  problem.  Since  most  handling  qualities  engineers  are  not 
familiar  with  the  field  of  psychometrics,  let  us  review  the  fundamentals 
of  the  techniques  we  will  be  using  before  we  construct  the  experiment. 

B.  REVIEW  07  MEASUREMENT  CONCEPTS 
1 .  Type*  of  Scales 

If  a  measurement  is  made  on  a  physical  object  with  an  instrument 
(nonhuman)  of  some  sort,  the  measure  is  an  objective  one  and  the  resulting 
data  lie  along  a  physical  continuum.  When  an  observer  estimates  a  measure, 
it  is  a  subjective  judgment  and  the  estimates  lie  along  a  psychological 
continuum.  The  relationship  between  the  objective  and  subjective  scales 
have  been  studied  for  many  years  for  certain  stimuli  (such  as  estimation 
of  weight,  loudness,  pitch,  etc.)  and  is  an  area  of  endeavor  called 
psychophysics. 

There  are  several  degrees  of  sophistication  of  psychophysical  scales. 
Table  VIII  repeats  the  measurement  scale  classification  as  found  in 
Rosenblith,  et  al  (  6  ) .  As  will  be  noted  in  the  table,  in  order  for 
means  to  be  legitimately  taken,  the  rating  scale  must  be  an  interval 
scale  as  a  minimum.  But  the  examples  of  scales  in  Table  VIII  are  «-~n 
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TABLE  VIII 


CLASSIFICATION  OF  MEASUREMENT  SCALES 

(From  Rosenblith,  Ref.  6.  Reproduced  by  permission  of  John  Wiley  &  Sons.) 


Scale 

Baric 

empirical 

operations 

Mathematical 

group- 

structure 

Permissible 

statistics 

Typical 

examples 

Nominal 

Determine* 
tion  of 
equality 

Permutation  group 

*'-/(*) 
where  /(*) 
means  any 
one-to-one 
substitution 

Number  of  cases 
Mode 

"Information" 

measures 

Contingency 

correlation 

"Numbering"  of 
football  players 
Assignment  of  type 
or  model  num¬ 
bers  to  classes 

Ordinal 

Determina¬ 
tion  of 
greater 
or  leu 

Isotonic  group 

**— /<*) 
where  /( x) 
means  any 
increasing 
monotonia 
function 

Median 

Percentiles 

Hardness  of 
minerals 

Grades  of  leather, 
lumber,  wool, 
and  so  forth 

Interval 

Determina¬ 
tion  of  the 
equality  of 
intervals 
or  of 
differ¬ 
ences 

Linear 

group 

«>0 

Mean 

Standard  devia¬ 
tion 

Temperature 
( Fahrenheit  and 
Celsius) 

Position  on  a  line 
Calendar  time 
Potential  energy 

Ratio 

Determina¬ 
tion  of  the 
equality 
of  ratios 

Similarity  group 
r'-a 
o>0 

Ceometrie  mean 
Harmonic  mean 
Par  cent  variation 

Length,  density, 
numerosity, 
time  intervals, 
work,  and  ao 
forth 

Temperature 

(Kelvin) 

of  physically  measurable  quantities.  What  about  psychological  quantities 
such  as  vehicle  handling  qualities  where  no  physical  parallel  exists?  Can 
an  "interval  scale"  of  a  purely  subjective  quantity  be  constructed?  The 
work  of  psychologists  in  the  field  of  psychometrics  indicates  that  it  is 
indeed  possible.  [The  excellent  works  of  Guilford  (  7  )  and  Torgenson  (  8  ) 
would  provide  the  reader  with  a  thorough  background  in  the  field  should 
he  desire  to  delve  further  into  the  details  of  the  subject.]  Applications 
to  problems  somewhat  akin  to  the  problem  being  considered  here  have  been 
made  by,  for  example,  Uhrbrock  (  9  ),  where  scale  values  were  determined 
for  a  large  number  of  rating  scale  statements  regarding  an  employee's 
suitability  to  be  employed  as  a  foreman .  Other  examples  are  readily 
found  in  the  literature  [see,  for  example,  Ferguson  ( 1 0) ,  Thur stone  (11), 
or  Uhrbrock  ( 1 2) ] . 


The  techniques  o^en  used  in  scaling  problems  of  the  type  we  have 
here  have  been  derived  from  notions  about  the  distributions  of  estima¬ 
tions,  particularly  the  concepts  associated  with  discrimination  thresholds. 
Thus,  we  will  have  to  review  some  additional  measurement  concepts.  A 
class  of  methods  introduced  by  Fechner  [see,  for  example,  Guilford  (7)] 
measures  just  noticeable  differences  (jnd)  along  the  physical  continuum 
and  uses  these  measures  of  resolving  power  as  equal  units  on  a  scale  of 
sensation.  By  assuming  ( 1 )  that  the  jnd  is  proportional  to  the  stimulus 
magnitude  (Weber’s  law)  and  (2)  that  each  jnd  represents  a  constant  incre¬ 
ment  in  sensation,  Fechner  derived  his  logarithmic  law. 

Thus,  if  s  is  the  stimulus  and  R  is  the  response,  or  sensation,  then 
the  difference  in  stimulus  magnitude  corresponding  to  a  jnd  is 


Also, 


As  =  sp  —  s1  =  k1  s  (Weber) 

AR  =  Rg  —  R-j  =  k2  (Fechner) 


(1) 

(2) 


Combining  the  two  expressions  yields  Fechner' s  logarithmic  law: 


or 


AR  = 


k 


As 

s 


R  =  k  log  s 


(3) 

(M 


The  accuracy  of  these  assumptions  has  been  given  considerable  attention 
subsequently  by  those  interested  in  measurement,  and  they  can  be  shown  to 
be  not  quite  true  for  sane  stimuli.  A  pertinent  distinction  has  to  be 
made  between  types  of  stimuli.  If  a  sensation  is  produced  by  adding  to  a 
stimulus,  i.e.,  by  increasing  its  magnitude,  such  as  would  be  the  case  in 
weight,  brightness,  or  loudness  estimation,  the  nature  of  the  continuum 
is  quantitative  and  is  called  "prothetic . "  The  class  of  continue  including 
qualitative  and  positional  aspects  of  things,  such  as  pitch  and  length, 
are  called  "metathetic."  For  this  class,  a  change  in  sensation  seems  to 
be  a  result  of  substituting  stimuli  rather  than  adding  them.  The  main 
point  of  this  distinction  is  that  in  the  metathetic  domain,  the  jnd  are 
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subjectively  equal  over  the  continuum;  whereas  in  the  prothetic  domain, 
the  jnd  grow  rapidly  in  subjective  size  as  we  go  up  the  scale  of  the 
continuum. 

The  significance  of  making  the  metathetic/prothetic  distinction  is 
the  following:  Using  Fechner’s  assumption  (Eq.  2)  we  would  expect  that 
summing  up  a  number  of  jnd's  above  the  absolute  threshold  (say  50)  would 
yield  twice  the  response  as  summing  up  half  that  number  (25),  since  each 
jnd  is  supposed  to  yield  equal  sensation  increments.  But  this  appears 
to  be  only  true  for  metathetic  stimuli  (pitch,  color,  position,  etc.) 
and  not  for  prothetic  stimuli  (weight,  loudness,  etc.;.  Thus,  if  three 
different  scaling  methods  are  used  to  estimate  the  psychophysical  scale 
of  a  prothetic  stimulus,  "apparent  duration"  [Churchman  (13)],  each  pro¬ 
cedure  yields  a  different  scale.  Stevens  (l4)  is  thus  forced  to  conclude 
that  scaling  methods  employing  the  assumption  of  subjectively  equal  jnd's 
or  discriminal  dispersions,  or  equally  often  noticed  differences,  probably 
do  not  result  in  interval  scales  for  prothetic  stimuli. 

Since  the  "handling  qualities"  of  a  vehicle  are  obviously  qualitative 
characteristics,  we  would  expect  the  continuum  to  be  metathetic.  We  could 
quite  reasonably  make  the  assumption  that  it  is,  which  in  effect  would  be 
defining  the  desired  psychological  continuum  as  being  one  on  which  a  sub¬ 
ject  has  a  constant  sensitivity,  or  discriminability,  across  the  entire 
scale.  Rather  than  make  the  assumption,  however,  we  shall  use  a  scaling 
method  which  yields  the  subjective  size  of  the  sensitivity,  so  that  the 
question  of  metathetic  or  prothetic  is  empirically  deterained.  Let  us 
say  only  that  evidence  indicating  a  metathetic  continuum  would  be  most 
welcome,  since  thu  resultant  scales  produced  by  different  scaling  methods 
tend  to  be  more  consistent  with  one  another  than  is  the  case  for  prothetic 
continue.  The  notions  which  lead  to  the  scaling  method  to  be  used  in  this 
study  will  be  discussed  next. 

2*  An  Intuitive  Example  of  the  Scaling  Method 

Before  writing  down  the  formal  equations  for  the  method  to  be  used, 
called  the  "Method  of  Successive  Categories,"  it  would  be  instructive  to 
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consider  an  example  from  a  field  more  closely  associated  with  psycho¬ 
metric  methods  —  the  evaluation  of  people. 

Let  us  suppose  that  we  have  a  collection  of  descriptions  of  various 
traits  of  people.  Our  problem  is  to  discover  how  suitable  a  person  would 
be  for  a  foreman's  job  by  soliciting  the  appropriate  description  of  him 
from  persons  who  know  him.  If  the  descriptions  have  somehow  been 
previously  scaled,  a  direct  numerical  indication  of  foreman  suitability 
will  be  available.  Uhrbrock  (  9  )  solved  the  scaling  problem  by  applying 
the  "Method  of  Successive  Categories"  (also  called  the  "Method  of 
Successive  Intervals")  as  discussed  briefly  below. 

Several  descriptive  phrases  (called  "items")  of  foremen  were 
collected.  The  descriptions  covered  the  entire  spectrum  of  foreman 
suitability,  from  the  best  to  the  worst.  Each  item  was  then  typed  on 
a  small  card,  and  the  resulting  stack  of  cards  was  given  to  each  par¬ 
ticipant  ( called  a  "rater" ) .  The  rater  was  placed  before  a  row  of 
boxes  ( say  1 1 )  and  asked  to  sort  the  cards  into  the  appropriate  boxes 
using  the  following  rules:  The  box  at  one  end  was  considered  to 
represent  an  "extremely  poor  foreman,"  while  the  box  at  the  other 
end  represented  an  "extremely  good  foreman."  The  boxes  between  the 
two  end  boxes  represented  foremen  between  the  two  extremes.  The  rater 
could  recheck  his  card  placement  as  often  as  necessary  to  satisfy 
himself  that  he  had  ordered  the  cards  correctly.  After  many  raters 
had  sorted  the  cards,  a  histogram  could  be  drawn  for  each  item, 
showing  its  frequency  of  placement  in  each  box.  Although  Uhrbrock 
did  not  publish  his  raw  data,  let  us  hypothesize  that  four  of  the 
items  had  distributions  as  shown  in  Fig.  2a,  where  the  histograms  have 
been  approximated  by  continuous  curves. 

It  can  be  seen  in  the  figure  that,  for  example,  most  of  the  raters 
put  phrase  A  in  box  2,  while  phrase  D  was  distributed  between  boxes  8, 

9,  10,  and  11.  After  noticing  the  locations  of  the  means  of  the  phrases, 
one  might  be  tempted  to  say  that  the  amount  that  A  was  better  than  B 
was  the  same  as  C  was  better  than  D,  or  that  A-B  =  C-D .  That  is  clearly 
not  the  case,  because  for  A  and  B  there  was  very  little  confusion  about 
which  was  the  better  phrase,  while  considerable  confusion  existed  wh_n 
C  and  D  were  evaluated,  as  exhibited  by  the  overlap  in  the  distributions. 
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The  effect  of  applying  the  Method  of  Successive  Intervals  to  the  data 
is  shown  sketched  in  Fig.  2b.  In  this  example,  the  method  (which  will  be 
shown  in  more  detail  in  the  following  section)  "stretches  out"  the  scale 
where  the  dispersions  are  small  and  "squeezes  up"  the  portion  of  the  scq,le 
where  dispersions  are  large  until  all  the  dispersions  are  approximately  equal. 

The  effect  of  the  manipulations  on  the  scale  values  of  the  items  is 
obvious.  On  the  psychological  continuum,  labeled  ty,  the  means  reflect 
our  earlier  feelings  that  there  was  indeed  a  larger  separation  between 
A  and  B  than  between  C  and  D.  It  is  the  application  of  this  method  to 
handling  qualities  descriptors  that  we  shall  work  towtrd  in  the  subsequent 
evolution  of  a  rating  scale. 


Items  A  B  CD 


(b) 

Figure  2.  Hypothetical  Results  of  Uhrbrock's  (9)  Rating  Scale  Results 
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C.  SCALE  VALUES  AS  DETERMINED  EY  THE  METHOD 
07  SUCCESSIVE  INTERVALS 

1 .  Selection  of  Items  to  be  Scaled 

Regardless  of  the  scale  form  finally  selected,  it  will  doubtless 
contain  descriptions  of  one  or  more  traits,  each  scaled  in  severed 
"degrees  of  goodness."  The  fact  that  there  are  not  many  distinct 
"degrees"  which  are  couched  in  simple  terms  requires  that  a  careful 
selection  of  the  candidates  be  made.  So,  for  example,  what  are  ten 
(or  so)  degrees  of  handling  qualities?  "Excellent"  would  probably  be 
fairly  specific  to  most,  but  what  are  some  others? 

To  get  at  this  problem  a  series  of  phrases  were  assembled  from  various 
sources  (including  rating  scales  currently  in  use)  which  expressed  sub¬ 
jective  traits  in  which  a  rater  might  wish  to  reply  in  a  rating  situation. 

Degrees  of  the  first  five  traits  of  column  6,  Table  VII,  were  included 

and  were  considered  to  include  the  majority  of  problem  areas  to  which  a 
rater  would  respond.  An  attempt  was  made  to  include  a  fairly  even  dis¬ 
tribution  across  the  continuum  from  "best"  to  "worst."  Table  IX  shows  the 
distributions  for  the  traits  considered.  The  traits  are  shown  vertically, 
while  degrees  of  goodness  of  the  traits  are  shown  horizontally.  The 
columns  do  not  imply  that  all  traits  in  a  specific  column  have  the  same 
psychological  weight  or  value.  The  procedure  to  be  followed  should  show, 
however,  that  the  degrees  are  in  the  correct  order. 

A  form  of  a  graphic  rating  sale  [Guilford  (  7  )]  was  used  to  gather 
the  necessary  data  for  the  Successive  Interval  Method.  The  graphic 
scale,  which  serves  the  same  purpose  as  the  "boxes"  of  the  foreman 
rating  experiment  of  Section  III-B-2,  is  similar  to  a  technique  used 
by  Lefritz  (15)  to  scale  200  adverb-adjective  combinations.  Unfortunately, 
Lefritz's  items  were  not  directly  suitable  for  a  rating  scale. 

Briefly,  in  our  survey  the  rater  was  instructed  to  read  over  a  list 
of  phrases  arranged  in  randan  order.  Then  each  phrase  was  presented  one 
at  a  time.  The  rater  was  to  imagine  he  were  reading  a  handling  qualities 
report  where  the  test  pilot  has  used  the  presented  phrase  in  describing 
a  vehicle.  The  rater  was  then  instructed  to  indicate  his  impression  of 
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DEGREES  OF  GOODNESS  OF  VARIOUS  HANDLING  QUALITIES  DESCRIPTORS  FROM  TABLE  VII 


the  vehicle,  as  gained  from  the  phrase,  on  a 
graphic  scale  with  end  points  "most  favorable" 
and  "least  favorable."  The  survey  form  together 
with  the  raw  data  is  included  in  Appendix  A. 

Tables  VII  and  DC  were  not  available  to  the  raters. 

For  example,  the  phrase  "controllable 
with  definitely  inadequate  precision"  might 
be  responded  to  by  a  rater  as  shown  by  the 
x  in  the  sketch.  The  distribution  of  all 
of  the  raters  surveyed  might  appear  as 
shown  by  the  bell- shaped  curve  in  the 
sketch.  A  total  of  63  persons  contributed 
their  time  in  scoring  6k  phrases,  thus 
providing  adequate  data  for  the  subsequent 
processing. 


most  favorable 


least  favorable- 


2.  The  Method  of  Successive  Intervals 


The  particular  method  we  shall  use  to  reduce  the  survey  data  is 
called  the  Method  of  Successive  Intervals.  This  particular  method  is 
based  upon  the  Law  of  Categorical  Judgment,  which  in  turn  is  derived 
from  Thur stone's  general  judgment  model  [see,  for  example,  Guilford 
(  7  ),  p.  35,  and  chap.  10], 


Consider  an  observer  comparing  two  stimuli  and  evaluating  their 
relative  values  with  respect  to  some  attribute.  Thrustone's  model  for 
such  a  process  is  given  by 


mg  —  mi 


zig(ai  *  ag 


1/2 

2rigt’i°g) 


(5) 


where  mg,  mi 
zig 

°i>  °g 


ig 


are  the  scale  values  of  the  i  and  g  stimuli 
along  the  psychological  (t)  continuum 

is  the  normal  deviate,  or  the  proportion  of 
times  that  g  was  judged  greater  than  i 

are  the  discriminal  dispersions  of  i  and  g,  i.e., 
the  standard  deviations  of  the  distribution  of 
responses  to  i  and  g  on  the  \|r  continuum 

is  the  correlation  between  i  and  g 
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The  assumptions  made  in  constructing  the  judgment  model  are: 


a.  Each  stimulus  gives  rise  to  a  "discriminal  process" 
which  has  some  value  on  the  continuum. 

b.  When  presented  with  the  stimulus  a  large  number  of 
times,  the  observer,  or  rater,  responds  with  a 
distribution  of  processes  because  of  fluctuations 
within  the  observer. 

c.  The  resulting  distribution  on  the  psychological, 
continuum  is  normal,  with  a  mean  called  the  scale 
value  and  a  standard  deviation  called  the  discriminal 
dispersion. 

In  the  derivation  of  the  successive  interval  notions,  seme  additional 
assumptions  are  made: 


d.  The  psychological  continuum  can  be  divided  into 
categories,  and  the  category  boundaries  exhibit 
a  fluctuating  value  along  the  continuum  similar 
to  stimuli.  The  category  boundaries  can  then  be 
treated  as  a  stimulus. 

e.  The  dispersions  associated  with  the  boundaries  are 
assumed  to  be  constant  across  the  continuum. 

f .  The  correlation  between  momentary  positions  of 
two  stimuli  is  zero  (rig  =  0). 

These  assumptions  reduce  Eq.  5  to 


tg  =  mi  +  sizig 


(6) 


where  now  a  boundary  scale  value,  tg,  has  been  substituted  for  the 
stimulus  scale  value,  mg.  We  now  have  in  Eq,  6 

t_  =  upper  boundary  of  the  gth  category 

o 

mi  =  the  scale  value  of  item  i 

=  the  discriminal  dispersion  for  item  i 

z.  =  the  normal  deviate  corresponding  to  the 
°  cumulative  proportion  of  the  gth  category 
for  item  i 


Thus  the  Law  of  Categorical  Judgment  is  reduced  to  the  notion  that  the 
differences  in  scale  values  between  a  stimulus  (phrase,  in  our  case) 
and  the  category  boundary  is  equal  to  the  proportion  of  times  that  the 
boundary  is  judged  greater  than  the  phrase  (z),  times  the  measure  of 
central  tendency  (s)  of  the  phrase. 

A.  particular  application  of  these  notions  was  made  by  Diederich  (l6), 
where  a  procedure  was  derived  to  minimize  the  mean- square  error  between 
the  model  (Eq.  6)  and  the  actual  data.  Cumrey  (17)  computerized  the 
procedure  so  that  a  routine  is  available  to  minimize  the  error  expression, 

E  =  L  £  (mi  +  sizi«”  *6)2  (7) 

i=1  g=1 

where  n  =  the  number  of  items  or  phrases 
k  =  the  number  of  categories 

The  routine  then  uses  the  normal  deviates  (z^-)  obtained  frcm  the  survey 
as  the  actual  data,  and  through  an  iterative  procedure  determines  the 
values  of  m^,  s^,  and  tg  which  minimize  the  E  of  Eq.  7. 

Certain  additional  restrictions  and  conditions  are  made  in  the 
procedure,  which  can  be  found  in  the  paper  (17)*  This  procedure  differs 
slightly  frcm  the  example  cited  earlier  (the  suitability  for  foreman 
problem)  in  that  here  the  dispersions  are  not  assumed  constant  but  are 
subject  to  empirical  test.  The  determination  of  the  "best  fit"  disper¬ 
sions,  then,  will  provide  a  check  on  our  earlier  feelings  that  opinions 
of  handling  qualities  are  metathetic  in  nature. 

3.  Result*  of  the  Experiment 

In  addition  to  subjecting  the  survey  data  to  the  successive  interval 
program,  some  rather  simple  manipulations  were  also  made  to  yield  the 
means  and  variances  of  the  raw  scores,  as  well  as  the  means  and  variances 
of  "transformed"  scores,  where  the  end  points  of  the  scale  were  fixed  for 
all  raters  by  making  a  simple  linear  transformation  to  the  raw  data. 

Since  the  raw  and  transformed  scores  lend  considerable  insight  into  the 
nature  of  the  rating  scale  problem,  those  results  will  be  discussed  first. 
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a.  Means  and  Variances  of  tha  Ray  Scores.  To  determine  the  nature 
of  the  responses  from  the  63  raters,  a  computer  routine  was  used  to  com¬ 
pute  the  mean  and  variance  of  each  item,  then  print  out  the  items  rank 
ordered  according  to  their  means.  The  results  are  shown  in  Appendix  B, 
Table  B-I,  columns  1  and  2.  The  "most  favorable,"  or  top  of  the  axis 

in  the  survey,  was  arbitrarily  labeled  zero,  while  the  bottom  was 
labeled  ten.  An  indication  of  the  semantic  ambiguity  of  the  ratings 
is  obtained  by  plotting  the  variance  of  the  item  against  the  item 
position  along  the  scale,  and  is  shown  in  Fig.  3a.  As  can  be  seen, 
the  items  become  increasingly  ambiguous  in  the  middle  part  of  the  scale, 
where  standard  deviations  as  high  as  1.5  occur.  The  curve  also  shows  a 
definite  skew  toward  the  bad  end  of  the  scale.  The  relative  ambiguity  of 
descriptors  can  be  assessed  by  carefully  studying  columns  1  and  2  of 
Table  B-I.  Notice,  for  example,  that  "very  poor  handling  qualities" 
and  "bad  handling  qualities"  (items  48  and  57,  p.  B-4)  convey  the 
same  meanings  to  raters  based  on  their  means.  An  attempt  was  made 
to  reduce  t  „  dispersions  of  Fig.  3a  by  constraining  all  raters  to 
abide  by  the  same  rules.  To  do  this,  a  simple  transformation  routine 
was  developed. 

b.  Means  and  Variances  of  the  Transformed  Scores.  Let  us  assume 
that  a  rating  scale  in  its  final  form  will  have  numerals  associated  with 
it,  and,  further,  that  there  are  two  points  along  the  scale  to  which  we 
:ould  insist  that  everyone  rate  in  common.  Ideally,  the  two  points 
would  demonstrate  a  low  variability  semantically.  Two  such  points  are 
available,  one  at  each  end  of  the  scale.  At  the  good  end  is  "excellent 
handling  qualities,"  while  "uncontrollable"  is  universally  agreed  upon 
to  fall  at  the  bad  end.  Both  of  these  phrases  have  very  low  variances 
associated  with  them.  By  insic  ing  that  all  raters  should  have  placed 
these  two  phrases  at  the  same  two  spots  along  the  scale,  we  can  make  a 
linear  transformation  of  all  of  the  scores.  Thus,  if  a  rater  had  a 
tendency  to  bunch  all  of  his  ratings  in  the  middle  of  the  scale,  the 
transformation  would  stretch  them  out.  If  a  rater  had  a  bias  toward 
one  end  of  the  scale,  the  transformation  would  remove  it.  The  justifi¬ 
cation  for  applying  such  a  routine  is  that  in  the  final  scale,  the  words 
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Rr 

b)  Variance  of  Transformed  Scores  (Matrix  -  64  x  63) 

Figure  3<  Variances  of  Semantic  Judgments  of  Handling  Qualities  Phrases 
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along  the  scale  vri.ll  be  fixed,  i.e.,  all  raters  will  have  common  end 
points.  Typical  raw  scores  and  transformed  scores  might  appear  as  in 
Fig.  4. 


Figure  4.  Possible  Effects  of  a  linear  Transformation 
on  the  Observed  Scores 

The  results  of  obtaining  the  transformed  ratings  are  shown  in  Fig.  5b, 
where  the  variances  of  the  transformed  scores  are  shown  plotted  against 
the  scores  themselves.  A  comparison  of  the  two  sets  of  variances,  those 
from  the  raw  scores  and  transformed  scores,  shows  that  the  agreement 
between  raters  is  made  worse  by  the  transformation,  if  anything.  The 
conclusion  is,  then,  that  rater  "bias"  end  rater  "gain"  are  not  signifi¬ 
cant  factors  causing  the  noted  dispersions,  and  that  no  advantage  will 
be  gained  in  further  manipulation  of  transformed  scores. 
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c,  Successive  Interval  Results.  The  questionnaire  data  was  put 
through  the  successive  interval  routine  of  Section  III-C-2  twice.  The 
first  time  all  phrases*  were  scaled.  The  phrases  were  then  culled  on 
the  basis  of  the  variance  of  the  raw  scores,  and  the  high  variability 
items  ( semantically  ambiguous)  were  removed  from  the  list.  The  remaining 
phrases,  shown  in  Table  B-II,  Appendix  B,  were  then  put  through  the 
scaling  routine  again.  This  second  set  of  values  is  a  good  approximation 
to  those  which  would  have  been  obtained  if  only  the  unambiguous  phrases 
had  been  included  in  the  questionnaire  initially.  The  results  of  both 
runs  are  tabulated  in  Appendix  B,  Table  B-I  and  will  be  discussed  in 
the  following  subsection.  The  scale  values  have  been  arbitrarily 
adjusted  to  a  nine  point  scale,  with  "excellent  handling  qualities" 
defined  as  1 .0,  and  "nearly  uncontrollable"  defined  as  9*0*  In  a 
final  scale  form,  10.0  could  be  reserved  for  "uncontrollable,"  although 
it  would  then  be  inappropriate  to  include  the  10.0  in  any  data  processing. 

The  scale  values  obtained  through  the  Successive  Interval  Method 
are  by  far  the  most  interesting  and  important  results  of  the  survey. 
Before  discussing  their  significance,  however,  it  would  be  appropriate 
to  point  out  that  the  dispersions  of  all  the  items  are  approximately 
equal  on  the  V  continuum  (see  column  6,  Table  B-I,  Appendix  B) .  Since 
it  was  not  necessary  to  assume  equal  dispersions  with  the  particular 
mean- square  routine  used,  we  have  empirically  shown  that  we  sure  dealing 
with  a  metathetic  continuum  (see  Section  III-B-1),  and  we  would  expect 
the  results  obtained  here  to  be  entirely  consistent  with  any  obtained 
through  other  approaches. 

D.  DISCUSSION  OF  RESULTS 

Armed  now  with  legitimate  numerical  scale  values  for  myriad 
descriptive  handling  quality  phrases,  we  can  now  assess  the  numerical 
character  of  contemporary  scales.  First,  to  get  an  idea  of  what  the 


•With  the  exception  of  no.  28,  uncontrollable.  When  all  responses  are 
in  the  first  or  last  category,  the  routine  will  not  converge,  which 
reflects  that  no.  28  is  an  absolute  end  point  and  does  not  properly 
deserve  a  scale  value  in  an  interval  scale. 
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scale  values  mean,  let  us  look  at  the  distributions  of  \|r  values  for 
the  degrees  of  goodness  of  handling  qualities.  Then  we  can  consider 
the  connections  between  the  \|r  values  and  the  Cooper  ratings.  Finally, 
we  can  estimate  the  errors  of  past  analyses  which  were  introduced 
through  unjustified  processing  of  data. 

1 .  Scale  Values  for  Degrees  of  Goodness  of  Handling  Qualities 

The  adjectives  modifying  handling  qualities  are  repeated  in  Table  X 
from  Appendix  B,  Table  B-I,  column  5>  and  are  plotted  in  Fig.  5* 

Several  characteristics  of  rating 
scales  can  be  inferred  from  the 
figure.  Notice  that  the  "neutral" 
area  (that  point  on  the  scale  which 
is  neither  "more  favorable"  or  "less 
favorable")  is  in  the  vicinity  of 
"fair  handling  qualities."  When  the 
questionnaire  was  developed,  normal 
practice  dictated  that  the  midpoint 
be  labeled  "neutral,"  but  since  a 
neutral  vehicle  was  difficult  to 
envision,  only  two  tie  points  (the 
end  points)  were  labeled.  We  now 
know  what  "neutral  handling  qualities" 
are — they  are  "fair." 

A  considerable  amount  of  interest 
in  the  Cooper  "boundaries"  is  exhibited 
by  most  experimenters.  A  careful  comparison  of  the  words  shown  in 
Fig.  5  with  the  Cooper  and  Cooper-Harper  scales  of  Tables  I  and  III, 
pp.  4  and  7 ,  establishes  the  probable  intersection  of  these  boundaries 
with  the  ^  scale.  These  "probable  areas"  are  shown  in  Fig.  5  as  the 
crosshatched  bands. 

Perhaps  a  key  observation  about  the  scale  is  that  in  terms  of 
discrimination  ability,  the  words  at  the  good  end  of  the  scale  are 


TABLE  X 

DEGREES  OF  GOODNESS 
OF  HANDLING  QUALITIES 


Adj  ective 

Scale 

Values, 

Excellent 

1 .00 

Highly  desirable 

2.25 

Good 

3.70 

Pleasant 

3.71 

Fair 

5.3^ 

Poor 

7.39 

Very  poor 

7.87 

Bad 

7.97 

Very  bad 

8.33 

Nearly  uncon¬ 
trollable 
(for  ref.) 

9.00 
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-  Excellent 


10  k 


Poor 

Very  Poor 
Bod 

Very  Bad 

Nearly  Uncontrollable 


Discriminal  Dispersion  of 
"Fair  Handling  Quallties"(s=l) 


Probable  Psychologically  Neutral 


Probable  Cooper  3.5 


Probable  Cooper-Harper  3.5 


Probable  Cooper  6.5 
Probable  Cooper  -  Harper  6.5 


Figure  5.  Dif  ribution  of  the  Degrees  of  Goodness 
of  Hand  tag  Qualities  Along  the  Scale 

much  more  distinct  to  r.  rater  than  at  the  bad  end.  The  discriminal 
dispersion  ip  sketched  on  the  figure  (s  =  l)  at  the  "fair"  rating. 
Recalling  that  dispersions  along  the  \|r- scale  are  nearly  constant,  it 
can  be  seen  that  a  considerable  amount  of  overlap  (demonstrating  con¬ 
fusion,  or  ambiguity)  in  dispersions  exist  for  words  at  the  bad  end 
of  the  scale. 

Thus,  if  a  rater  were  to  have  as  a  tool  the  words  shown  in  Fig.  5, 
we  would  expect  to  observe  considerably  more  scatter  in  the  ratings  of 
a  bad  vehicle.  Evidence  supporting  this  contention  is  sparse,  due  to 
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experimenters'  habits  of  averaging  data  before  publication,  but  some 
support  exists  in  works  by  Jex  (18)  and  Durand  (19).  Figure  5  does 
lead  us  to  suspect  that  we  are  fooling  ourselves  when  we  place  great 
weight  on  the  fine  distinctions  made  by  raters  near  the  "bad"  end  of 
current  rating  scales.  Let  us  look  more  closely  at  the  connections 
between  the  \jr  scale  and  con  temporary  rating  scales. 

2.  Connections  Between  the  y,  Cooper,  and  Cooper-Harper  Scales 

The  list  of  phrases  presented  to  the  63  participants  in  the  survey 
contained  a  large  proportion  of  the  individual  statements  describing 
Cooper  ratings  (l)  and  Cooper-Harper  (C-H)  ratings  (3).  A  plot  of  these 
statements  is  shown  in  Fig.  6.  Space  is  not  available  to  label  each 
point  by  its  phrase,  but  Table  XI  shows  the  scale  values  in  superscript 
following  each  phrase  of  the  C-H  scale.  These  statements  and  values 
were  culled  from  Appendix  B  to  reconstruct  the  scale  and  are  the  data 
correlations  at  the  most  fundamental  level.  It  is  clear  that  seme  of 
the  phrases  describing  the  poorer  ratings  are  not  even  ordinal,  as  shown 
by  the  nonmonotonic  nature  of  the  data  at  the  ratings  of  7  and  8. 

A  smoothed- over  view  of  the  data  is  obtained  by  the  curve  fit  shown 
in  Fig.  6.  The  curve  fits  reasonably  well  and  is  given  by 

\|r  =  1+8  log  R  (8) 

A  fit  which  is  only  slightly  less  accurate,  but  which  would  be 
easier  to  handle  mathematically  in  seme  cases  is 

R  =  a\|^  +  b  (9) 

and  is  also  shown  in  Fig.  6.  Figure  7  shows  a  logarithmic  plot  of  the 
same  data. 

A  precise  view  of  the  data  would  take  into  account  the  flat  spots 
in  the  Cooper,  Cooper-Harper  ratings  around  3—b,  and  6-8  which  indicate 
that  the  adjectives  used  to  describe  differences  in  these  regions  are 
inadequate  for  discrimination. 


38 


X 

a 


!  cs 

CO 

sf- 

to 

«< 

«< 

-< 

o 

« 

*o  • 

o 

*  % 

e 

■°o  2 

CO 

UJ  H* 

UJ  _ 
82. 

O  < 

Ui  CO 

Uf  X 

UJ  H* 

X  o 

X  Ui 
- 1 

-1 

CO  — 

—  o 

—  X 

1-  ° 

£  fc 

s  s 

CO.  ^ 

UJ 

25  S 

5  2 

>  X 

cs 

5.7 

>■  X 

5  o 

6  £ 

2  ui 

P  £ 

X  -J 

X  CO 

CO  UJ 

—  <  l 

X  2 

—  -< 

CO 

«o  5 

e  s 

<©  X 

tf>  •  Ui 

<<b  •  o 

CO  — 

O  K 

CO  X 

uj  co 

2  £ 

2  g 

—  X 

o  o  1 

M.  — 

o  o 

X  u 

5  fc 

fi  >- 

Ui 

—  CO 

jq 

-  8 

—  -J 

u  — 

o  uj 

—  X 

n 

UJ 

8 

5  £ 

H  2 

X  — 

UJ  9 

_j 

CO  — 

Ui  X 

O  O 

•0 

«< 

*  X 

o 

UJ 

Ui 

CO 

UJ  X 

x 

Q 

-1  * 

o  — 

-J 

m 

X  o 

X 

03  UJ 

•o 

*  ~ 

—  Ui 

<  o 

Ui 

o 

mJ 

•J 

*  8 

S  2 

X  X 

2  5 

K  x 
o  o 

>- 

_i 

£ 

>-  — 

2^ 

£  2 
•<  X 

5 

ae 

-J  X 

o 

UI  X 

h-  s 

X 

•< 

£  £ 

y-  u. 

2  S 

2  S 

O  x 

q 

CO 

UJ  X 

X 

X  ° 

X 

>-  X 

P 

X 

UJ 

..J 

8  8 

s  s 

-J  X 
x  CO 

UJ 

_  X 

►—  «< 

_J 

d 

fO  - 
o 

a.  “ 

X  9 

x  5 

Ui  Ui 

^  X 

X  o 

X  CO 

u 

o 

—  o 

X 

o  < 

X 

UJ 

§ 

<  Q 

X  O 

2  2 

2  2 

JO  O 

ii 


3. 

A 


I 

IS 


s 

* 


—  o 


%  2 
2  S 

a. 

s.  s 

M  U 


o  o 


■X  =  *_ 

5  £  *  g 


»-  UJ  O  o 

^  O  X  X 
O  K  OS 
K  S  “  W 
<  X  3  - 
X  «  O' 


-—  X  X  to 

fe^2  2 

.UJ 


o  £ 

—  X 
U.  J  UJ 
X  CD  O 
O  <  X 


O  h-  Ui 
-WO. 
h  U 
o  n  uj 
X  -J 

2  vo  m 
Ui  < 
o  X  H 

►  5  uj 

OS  O  o 

UJ  Ui  o 


IX  CO  K 

S  2  5  5 

o5*  -*  o  3C 

^g°5 

btsg 

<->  O  _J  LL 
X  O  -  K 


I 


O  t  X 
—  Ui  o  Ui 
U_  u  I 

ui  x  *  rn 
o  <  X  < 
H-  O  P 
OS  tL  -  £ 
O  UJ  CO  ui 

->  O  CO  o 

<  y  -  o 

X  ^  Z 


—  CO 
H  CO 


co  ui 

s  2 

CO  — 

h- 

CO  X 

Ui  o 

X  u 


O*  X 
Ui  < 


J  £ 


9  o 
u 


Q  UJ 
X 


Ui  O 

iJ  P 
3  S 
©  t 

oc  ^ 

&  o 
o  X 
o  < 


9) 


X 

«< 

X 

CO  o* 


CO  UJ 

w  a c 


3  t 

o  «( 

P  o 


8 


o  o 

X  _J 

S  Z 


£ 


X  o 


8.2  5 

Who. 

“SPfc 

J  o  -  w  «« 

d  s  *  a  ^  « 

*,  s  s  s  2  2 

t  q  g  |  2  8 


2  J*  8* 

Z  UJ  Ui  h 

X  >  ©  —  h- 

^  ^  P  <  >  o  • 

CO  X  -J  X 

>-  UJ  X  UJ  X  —  o 

H-XPOX- 

P  O  -  X  —  h- 

XX  <  M  UJ  < 

<  □  H  X  40  -I  CO 

CO  h  -  X  «  -  tt  X 

O  O  «<  O  Z  —  UJ 

9  -  X  X  CO  X 

J  LL  OS  os  O'  <  T 

Ui  UJ  <  UJ  o  Ui  O 

X  O  x  X  x  X  o 


*-  o 
O  u 


?  %  oc 

5SS 


<  CO 

o  ►-  — 


CO 

ce 

X 

X 

X 

B 

X 

L 

►- 

X 

Ck 

o 

X 

MB 

ac 

-< 

X 

X 

o 

o 

o 

8 

s 

X 

■< 

X 

• 

X 

x 

. 

o 

BM 

X 

> 

X 

fc 

X 

w 

u 

ro 

X 

z 

bJ 

< 

Q 

o 

o 

X 

BQ 

X 

o 

h- 

X 

X 

o 

BB 

X 

CO 

o 

X 

BB 

X 

CO 

>- 

X 

X 

CO 

-J 

X 

X 

•< 

3 

ui 

o 

3 

2 

z 

X 

X 

o 

X 

X 

X 

X 

u 

o 


o  >-  X  o 

—  Of  o  X  UI  — 

r  o  x  uj  j  h 

5  ►-  x  >►  «  ■< 

X  UJ  Ui  —  CO 

CO  O  •  X  co  x 
UJ  X  ►“  X  <  UJ 

—  ^  X  Ui  O  UJ  X 

o  X  w  H  -  u.  1 

x  x  «<  CO  O 
UJ  UI  UJ  »  CO  x  o 

U^gXXXK 

X  §*  X  <  X  K  S 

UJXXXO<- 
O  X  —  —  X  Z  X 


o 

X 

iU 


s 

rEh 

X  —  o 

8  - 


CO 

CO 


© 

u 


X  O  X  X 

o  ui  —  o  ui  x 

_J  —  -J  o 

UJJOCOCO  — 
JOUJ(0<h 
B  OS  o  -  J  X 

P  ^  X  —  UJ 

A.  X  X  ^  h- 

^  o  ^  u-  >«  P- 

o  o  X  o  <  < 


-j  X 

2  ci 

k_  o 

*  £ 

o  X 

2  8 


39 


UNCONTROLLABLE  IN  MISSION 


Either  the  data  points  themselves  or  the  "smoothed"  relationships 
of  Eqs .  8  and  9  provide  a  means  to  average  data  obtained  from  contemporary 
scales.  As  will  be  recalled,  to  obtain  the  best  estimate  of  the  true 
value  of  a  measured  quantity,  the  data  to  be  averaged  should  come  from 
an  instrument  with  constant  sensitivity  along  its  scale.  Thus,  although 
it  could  be  argued  that  since  \|/  and  R  are  functionally  related  either 
could  be  averaged)  the  desired  quantity  is  \|r.  We  have  argued  earlier 
that  from  other  considerations  (i.e.,  prothetic  versus  metathetic)  the 
\|f  scale  data,  by  virtue  of  its  linearity  with  subjective  magnitude, 
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R  Scale  -  Cooper,  Cooper-Harper  Ratings 

Figure  7.  A  Comparison  of  Cooper  and  Cooper-Harper  Ratings 
with  Corresponding  ty-Scale  Values  (Log  Scale) 

should  be  the  quantitites  which  are  averaged.  The  constant  discriminal 
dispersion  and  the  linearity  of  subjective  magnitude  are  just  two  ways 
to  reach  the  same  conclusion)  that  is,  the  best  estimate  of  a  rater's 
subjective  opinion  is  obtained  by  averaging  the  y  data. 

3.  Error  Introduced  by  Averaging  Cooper-like  Ratings 

Let  us  try  to  estimate  the  error  which  would  be  introduced  by 
averaging  Cooper  ratings  directly  instead  of  using  the  ^  transformation 
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of  Fig.  6.  The  assumption  is  that  a  large  number  of  ratings,  R,  are 
available  for  one  vehicle,  so  that  there  is  no  question  of  there  being 
a  difference  in  means.  Based  on  our  earlier  discussion,  the  true  mean 
is  obtained  by  averaging  the  ^  values  for  a  set  of  observations.  For 
convenience  in  analytic  treatment,  the  "smoothed"  fit  of  Eq.  9  will  be 
used. 

Let  us  call  this  true  mean  along  the  R  axis  Rj.  On  the  other  hand, 
if  our  habit  has  been  to  average  the  R  values  directly,  the  mean  would 
be  R.  Let  us  define  an  error,  e,  which  is  the  difference  between 
averaging  the  ratings,  R,  directly  and  averaging  the  \|r  values  obtained 
by  transforming  R  via  Eq.  9  and  then  converting  back  to  R.  Let 

e  —  R  ~  Rfp 

1  A 

Then  R  =  — -VRj^ 

_2 

and  Rip  =  aV  +  b 

Recalling  that  the  variance  of  \|r  is  given  by 

°2  =  IT  £(*i-t)2  =  E*2-^2  03) 

_2 

We  can  solve  for  \| \  and  substitute  from  Eq.  9, 


So,  from  Eq.  12, 

Rt  =  R  -  ao^  (15) 


(10) 

OD 

(12) 


Finally,  from  Eqs.  10  and  15, 


From  Fig.  6  it  is  seen  that  the  values  a  =  0.11,  b=0.89  give  a  good  fit, 
and  it  will  be  recalled  that  cty  is  the  discriminal  dispersion  which  ve 
found  from  the  successive  interval  method  (see  Fig.  5  and  Appendix  B) 
and  which  is  approximately  constant  with  a  value  of  unity  along  the 
\|r  axis.  Thus,  the  errors  obtained  by  averaging  Cooper-Harper  ratings 
are  given  by  Eq.  1 6  as  0.11  XI  =0.1  Cooper  unit.  This  is  an  optimistic 
calculation  since  it  made  the  assumption  that  the  only  errors  in  the 
rating  were  due  to  the  nature  of  the  scale  itself.  Other  errors  are 
likely  in  the  rating  situation  (i.e.,  bias  between  raters  due  to 
training,  experience,  etc.)  so  this  would  be  the  limiting  best  case. 

k.  Determination  of  Necessary  Trial  Size 

Although  we  have  demonstrated  that  very  little  error  is  introduced 
by  averaging  Cooper  ratings  directly,  the  assumption  was  made  that  an 
adequate  quantity  of  data  were  available  to  give  a  high  level  of  con¬ 
fidence.  Let  us  see  what  sample  size  requirements  are.  It  should  be 
obvious  from  Fig.  5  that  more  reliable  data,  in  terms  of  Cooper  ratings, 
are  obtained  at  the  "good"  end  of  the  Copper  scale.  Let  us  consider  the 
case  where  an  experimenter  is  trying  to  compare  two  slightly  different 
(he  thinks)  vehicles.  In  the  past,  experimenters  have  liked  to 
distinguish  between  vehicles  differing  by  one  Copper  unit.  Let  us 
hypothesize  that  case,  and  compute  the  number  of  trials  that  should 
have  been  made  to  achieve  a  confidence  of  95  percent. 

The  t-test  will  be  used,  which  requires  that  the  variance  of  the 
data  be  known  and  be  approximately  the  same  for  the  two  independent 
samples.  This  requirement  is  reasonably  met  by  the  conditions  here, 
since  the  variance  along  the  R  scale  changes  very  little  in  one  Cooper 
unit.  We  shall  have  to  calculate  its  magnitude,  however,  since  we  only 
know  that  the  variance  is  constant  on  the  \|r  scale  at  this  time. 

a.  Computation  of  Variance  Along  the  R  Scale.  From  Eq.  9  we  know 

that 

2 

R^  =  a\^  +  b 

so  that 

Ri  “  ^  =  a(ii  “  +  b  07) 


Solving  for  or^, 


we  obtain 


ORi  =  -  o^) 


08)  | 


Sir^e  2^  » 


oRi  =  2ao^i\^i 


09) 


In  terras  of  the  R  continuum, 


0R;  »  4a2o<^  (-1  a  — )  =  4aoi|i2(Ri  ~  b) 


(20) 


Since  the  t-test  requires  that  the  variances  of  both  samples  be  equal, 
we  shall  calculate  oj^  at  the  R=m+  1/2  points  and  let  those  values 
approximate  the  variances  at  m  and  m  +  1 . 

b.  The  t-Tcat  for  Difference  of  Means.  The  minimum  trial  size  can 
be  simply  determined  from  the  t-test.  Given  two  sets  of  independent 
observations,  form  the  sample  statistic 


t 


c 


(21) 


with  sample  variance  and  with  n-|  +  n2  —  2  degrees  of  freedom  [Hald, 
p.  591  (20)].  We  will  specify  the  minimum  difference  of  means  which 
we  want  to  detect  as  |Ri  -  R2I  =  1.0  Cooper  unit.  For  an  equal  number 
of  observations  in  each  set,  the  sample  statistic  becomes 


t 


c 


(22) 


Substituting  Eq.  20  for  a. 


tc 


jE 

2.82^0.1 1R  -  O.O98 


which  is  plotted  in  Fig.  8  for  several  values  of  R. 


(23) 
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n  .Number  of  Trials  in  Each  Sample 

Figure  8.  Trial  Size  Determination  from  the  t-Test 

The  sample  statistic,  tc,  is  to  be  compared  with  the  computed 
statistic,  t^  n  based  on  tables  of  the  t-distribution.  The  tables 
give  t^n  at  the  a-level  of  confidence  and  for  n- 1  degrees  of 
freedom.  The  table  values  of  t'  aire  also  plotted  in  Fig.  8  for 
a  =  95  percent.  The  condition  indicating  a  significant  difference 
in  means  of  1 .0  at  the  95  percent  confidence  level  requires  that 

t*  <  tc  (2k) 

It  can  be  seen  that  the  number  of  trials  is  a  function  of  location 
along  the  R  scale,  as  we  originally  expected.  If  the  locus  of  points 
where  t '  =  tc  is  plotted,  the  number  of  trials  as  a  function  of  R  will 
be  available.  This  has  been  done  in  Fig.  9>  where  it  can  be  seen  that 


n  =  3.5R  (29) 


Figure  9.  Minimum  Number  of  Trials  Required  in  Each  of  Two  Independent 
Samples  to  Determine  that  the  Sample  Means  are  Different  by 
One  Cooper  Unit  with  95  Percent  Confidence 

These  results  are  somewhat  surprising.  For  a  vehicle  which  is  near 
the  Cooper  "Acceptable/Unacceptable"  boundary.  Fig.  9  indicates  that 
approximately  20  trials  would  be  needed  for  high  confidence.  Remember, 
too,  that  these  calculations  are  optimistic,  i.e.,  sources  of  variability 
other  than  semantic  ambiguity  have  not  been  considered.  We  have  used 
the  "average  rater,"  one  who  has  the  rating  characteristics  shown  in 
Fig.  6. 

It  is  highly  doubtful  that  trial  sizes  on  the  order  of  20  have  been 
obtained  in  practice,  which  means  that  the  level  of  confidence  is  lower 
than  95  percent  in  the  measures.  Here  is  another  reason  to  keep  careful 
records  and  publish  the  raw  data.  In  any  event,  Eq.  25  shows  that  for 
any  given  confidence  level,  the  number  of  observations  made  for  "bad" 
vehicles  should  be  increased  an  order  of  magnitude  over  "good"  ratings 
if  the  Cooper  scale  is  used. 
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I.  SUMMARY  AND  COYCLUBZONB 

In  this  section  we  have  established  a  rationale  for  quantitative 
handling  qualities  ratings  using  psychological  measurement  techniques. 

In  addition  to  determining  numerical  values  for  63  descriptions  (see 
Appendix  B,  Table  B-I),  which  should  be  useful  in  constructing  any  scale, 
we  have  shown  that  contemporary  scales  (i.e.,  Cooper)  are  very  nearly 
functionally  related  to  the  underlying  quantitative  scale.  'The  smooth 
appearance  of  the  function  (for  example,  Fig.  6)  demonstrates  that  a 
very  large  amount  of  thought  and  wisdom  went  into  these  original  scales, 
and  also  demonstrates  why  subsequent  improvement  has  been  so  difficult. 

The  data  shows  that  very  little  error  is  introduced  by  averaging  Cooper 
ratings  directly  rather  than  transforming  to  the  quantitative  \j»  scale. 
However,  in  order  to  obtain  adequate  data  for  averaging,  and  to  place  any 
weight  on  differences  of  one  or  two  Cooper  units,  a  large  number  of  trials 
will  have  to  be  made,  particularly  when  the  vehicle  is  "bad"  (see  Fig.  9). 

With  an  underlying  quantitative  scale  now  established,  the  next  step 
will  be  to  construct  several  scales,  then  test  them  with  some  actual 
rating  experiments.  In  the  next  section,  the  experiment  will  be  described 
together  with  the  rating  scales  which  were  used  and  the  measurements  which 
were  taken. 
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SECTION  TV 

DESCRIPTION  OP  SB  EXPERIMENT 


A.  OBJECTIVES 

With  the  more  complete  understanding  of  rating  scales  which  has  been 
obtained  in  the  previous  sections,  we  are  now  in  a  position  to  conduct 
rating  experiments.  The  general  objectives  of  the  experimental  program 
are  to  determine  the  factors  which  influence  pilot  opinion  and  to  determine 
if  a  modified  scale  (or  scales)  would  be  an  improvement  over  present  scales. 

B.  mVRXMBTCAL  FUN  AND  SETUP 
1.  Sing.1.  "Loop  Experiments 

a.  Simulation.  A  fixed-base  simulator  with  a  CRT  display  and  fighter- 
aircraft-type  center  stick  was  used  for  the  experiment.  Compensatory 
tracking  in  pitch  was  used  for  the  primary  rating  task,  with  the  dynamics 
being  simulated  on  a  GEDA  analog  computer  and  displayed  with  a  horizon  bar- 
like  line  on  the  CRT.  A  roll  axis  was  also  mechanized  to  enable  a  secondary 
tracking  task,  so  that  the  CRT  horizon  bar  could  both  pitch  and  roll.  The 
sensing  was  inside-out,  i.e.,  as  in  a  conventional  aircraft  artificial 
horizon,  and  is  sketched  in  Fig.  10.  At  the  distance  that  the  pilot  sat 
from  the  CRT  (about  46  cm),  a  one  cm  displacement  in  0  subtended  an  angle 
of  1 .25  deg  at  the  pilot's  eye.  The  spring  gradients  of  the  stick  were 
13  N/cm  (7.5  lb/in.)  for  the  elevator  (Be)  and  3.5  N/cm  (2  lb/in.)  for 
the  ailerons  (6a)* 

A  random- appearing  sum  of  twelve  sinusoids  was  used  as  the  command  input 
to  the  pitch  axis.  Three  bandwidths  (l  .88,  2.89^  and  4.77  r/s)  and  three 
amplitudes  (0.5,  1.0,  and  1 .5  cm  rms)  of  input  were  available.  The  fre¬ 
quencies  of  the  input  were  selected  to  be  suitable  for  a  100  sec  run 
length,  and  are  given  in  Table  XII  together  with  the  number  of  cycles 
in  a  run  length.  The  sinudoids  making  up  the  shelf,  i.e. ,  the  frequencies 
beyond  the  bandwidth,  have  an  amplitude  l4  d£  down  from  the  main  rectangular 
portion  of  the  input.  A  sketch  of  the  spectral  characteristics  for  the 
1.88  r/s,  0.5  cm  rms  input  is  shown  in  Fig.  11,  and  is  labeled  B6"-1 .88-0. 5 
in  accordance  with  the  convention  used  by  McRuer  (5). 
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Figure  10.  CRT  Display  for  Single-Loop  Plus  Secondary  Tasks 
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Figure  11.  B6"-1 .88-0.5  Input  Spectrum 


TABLE  XII 

INPUT  FREQUENCY  CCMPONENTS 


CCNFCHSKT 

HO. 

COMFOnVT 

nucqnHcr,  a) 

(r*d/«*c) 

HO.  OF  CKU8/100  SIC, 
n  -  lOOto/Sx 

1 

0.188 

5 

2 

0.J14 

5 

5 

0.JO2 

8 

4 

0.8l6 

15 

5 

1.192 

19 

6 

1.88 

>0 

7 

2.89 

46 

8 

**.77 

76 

9 

7.55 

117 

10 

9.25 

147 

11 

12.25 

195 

12 

15.00 

259 

b.  Controlled  Ilementi.  As  discussed  earlier  (Section  II. B. 3), 
single-loop  compensatory  tracking  with  a  few  simple  controlled  elements 
will  adequately  describe  a  variety  of  vehicle/task  configuration.  Thus, 
in  the  main  rating  experiments  we  shall  use  the  idealizations  studied  by 
McRuer  (5)  together  with  some  additional  simplified  elements  necessary 
to  obviate  the  requirement  for  certain  inferred  correlations  related  to 
pilot  lead.  A  matrix  of  controlled  elements,  together  with  the  gains 
and  the  command  inputs  used  with  each  is  shown  in  Table  XIII  (the  key 
to  the  inputs  of  Table  XIII  is  given  in  Table  XIV ) . 

As  will  be  noted  from  Table  XIII,  the  possible  number  of  configurations 
is  considerable  if  all  of  the  inputs  were  applied  to  each  controlled  ele¬ 
ment  and  gain.  In  order  to  make  the  experiment  feasible,  the  experimental 
design  had  to  yield  less  than  approximately  fifty  configurations,  and  at 
the  same  time  obtain  enough  data  to  allow  the  testing  of  trends  across 
the  many  variables  (both  explicit  and  implicit)  of  interest.  A  detailed 
look  at  the  finally  selected  configurations  of  Table  XIII  will  yield  the 
following: 

1  .  Excellent  tests  of  consistency  for  K/s  and  K/s2 
at  the  K/Kjj  =  1  points  of  the  matrix  would  be 
expected  in  accordance  with  findings  in  previous 
studies  of  system  and  operator  parameters  [McRuer 

(5)]. 

2.  Trends  with  gain  axe  provided  for  six  Yc's,  three 
controlled  element  forms  have  five  gain  levels,  and 
the  remaining  three  Yc's  fill  in  between  the  extremes 
of  equalization  required  with  three  gain  levels. 

Thus  an  adequate  range  of  element  forms  exists  for 

a  dynamic  gain  range  of  100. 

3.  Variation  of  parameters  with  input  bandwidth 
and  amplitude  can  be  extrapolated  to  all  of  the 
the  forms  from  the  .  1  Kg/ s ,  Kg/s,  10Kg/s,  .1Kg/s2, 

Kg/s2  and  lOK^/s2  points. 

h.  A  good  range  of  each  of  the  myriad  system  and 
operator  parameters  is  obtained . 

The  configurations  of  Table  XIII  were  thus  considered  to  adequately 
represent  the  single-loop  tasks  of  interest. 


TABLE  XIII 


CONTROLLED 

ELEMENT, 

Y„ 


INPUT  MATRIX  FOR  CONFIGURATIONS 
(See  Table  XIV  for  Input  Key) 

CONTROLLED  ELEMENT  GAIN,  K/Kb 

0.1  0.5  1  5  10 


/  Sj.2—  \ 

\cm-sec  ,  6 e/ 


a,a,b,c,c 
d,e,f  ,g 


K/s  a,c  a  >> 

K/s(s+4)  a  a 

K/s(s  +  2)  a 

K/s(s  + 1 )  a, a  a,i 

K/.2  «,b  a 

K/s(s  —  1)  a 

K/(s  —  2)  a 

K/[s2  +  2(0.7)7.8s +Y.82]  a  a 

K/[s2  + 2(0.7)  1 6s  +  162]  a  a  a 


a  a,b,c 


a,b  I  a 


a,a,b,c 

d,e,f,g 


a  a,b,c 


O.586 

2.15 

2.15 

2.15 

1.17 

1.075 

5.45 

8.38 

35.2 


*n  =  exponent  of  free  s  in  denominator  of  Yc.  Kb  =  KuesT  as 
determined  in  an  independent  set  of  trials. 


TABLE  XIV 


KEY  TO  INPUTS  OF  TABLE  XIII 


CODE  FROM 
TABLE  XIII 


INPUT 

BANEWIDIH, 

a*  (r/s) 


INPUT 
AMPLITUDE, 
(cm  rms) 


0.5 

1.5 

*0.5 


o.  Btcoadary  Talk  ( Lattral ) .  In  an  attempt  to  find  a  good  correlate 
with  pilot  opinion,  a  workload  measure  of  seme  sort  was  considered  extremely 
desirable,  primarily  because  a  vehicle  evaluator  invariably  expresses  some 
subjective  impressions  regarding  "attention,”  etc.,  when  in  a  given  rating 
situation.  Experimenters  have  made  numerous  workload  related  measures  with 
secondary  tasks  such  as  the  extinguishing  of  lights,  mental  exercises, 
tracking  tasks,  etc.,  [e.g.,  Gaul  (57)]  and  have  met  with  varying  degrees 
of  success.  A  number  of  difficulties  are  apparent: 

1.  The  scores  obtained  from  secondary  tasks  (such  as  number 
of  lights  turned  out,  etc.)  are  difficult  to  relate  to 
any  measurable  characteristics  of  the  system  because 
most  are  discrete  in  nature.  Those  tasks  which  are  con¬ 
tinuous  have  no  analytical  tie  with  system  parameters. 

2.  The  scores  are  quite  variable  since  they  depend  highly 
upon  the  subject's  motivation  and  the  performance 
requirements  of  the  task. 

5.  If  it  is  attempted  to  force  the  operator  to  his  capa¬ 
city  via  a  technique  which  paces  the  difficulty  of  the 
secondary  task,  the  primary  task  generally  is  neglected 
in  favor  of  the  secondary  task. 

To  overcome  these  difficulties,  an  unstable  tracking  task,  called  the 
"critical  task"  [see  Jex,  et  al  (22)]  was  used  as  a  secondary  loading  task 
and  was  mechanized  such  that  it  could  not  become  the  primary  task  when  the 
operator  was  near  capacity.  The  use  of  the  critical  task  offered  the 
advantages  of  having  an  easily  adjustable  unstable  root  which  is  somewhat 
proportional  to  task  difficulty  and  is  related  directly  to  the  operator's 
time  delay  while  tracking.  Urns,  although  it  was  not  the  objective  of 
this  program,  a  workload  theory  involving  system  parameters  could  be  evolved 
at  a  suitable  time.  Here  we  wanted  to  find  an  objective  measure  which  was 
sensitive  to  handling  qualities  and  thus  could  be  correlated  with  pilot 
opinion. 

The  mechanization  scheme  used  was  similar  to  that  proposed  by  Kelly  (23). 
The  difficulty  of  the  secondary  task  was  made  proportional  to  primary  task 
performance .  Thus,  when  the  operator  was  keeping  primary  system  error  (per¬ 
formance)  less  than  a  criterion  value,  the  secondary  difficulty  increased. 
When  the  operator  was  so  busy  with  the  secondary  task  that  primary  error 
was  larger  than  the  criterion  value,  the  secondary  difficulty  decreased. 
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The  final  level  of  difficulty  was  determined  by  the  sensitivity  of  the 
primary  task  performance  to  loading  by  the  secondary  task.  The  results 
of  the  experiment  will  show  if  this  "sensitivity"  is  a  determining  factor 
of  pilot  opinion. 

Hie  secondary  task  was  prevented  from  becoming  the  primary  task  by 
giving  the  following  instructions  to  the  subject:  "Your  objective  is 
to  get  the  highest  secondary  task  score  you  con.  To  get  a  high  score 
you  must  keep  the  primary  task  error  very  small.  If  you  allow  the 
primary  error  to  get  large,  your  score  will  decrease.  The  problem 
will  stop  if  either  primary  or  secondary  tasks  ore  allowed  to  exceed 
the  display  limits." 

A  block  diagram  of  both  primary  and  secondary  tasks  is  shown  in 
Fig.  12.  To  avoid  any  confusion  over  the  definition  of  workload  (i.e., 
is  it  physiological  or  psychcmotor  workload?),  the  parameter  X  shown  in 
the  figure  was  assumed  to  be  related  to  the  "attention  level"  required 
of  the  operator. 


•c 


Figure  12.  Single-Loop  Primary  Task 
with  Secondary  Cross-Coupled  Loading  Task 
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2.  Procedure 


The  configurations  were  presented  in  a  randomized  sequence  to 
minimize  halo  effects  (recall  that  a  halo  effect  is  the  tendency 
of  a  subjective  response  to  be  influenced  by  the  preceding  stimulus- 
response  pair).  Repeats  were  placed  carefully  in  the  sequence  to 
balance  out  time  of  day  and  halo  effects.  A  detailed  run  log  is 
given  in  Appendix  C  (Table  C-I).  The  time  required  for  each  con¬ 
figuration  was  eight  minutes.  The  pilot  actually  had  to  perform  two  tasks 
sequentially.  First  he  was  asked  to  track  longitudinally  to  minimize  the 
pitch  error.  During  this  tracking  period,  which  lasted  120  seconds,  he  was 
asked  to  formulate  his  opinion  of  the  configuration  based  on  the  task  per¬ 
formance  criterion.  If  more  time  was  required  to  form  an  opinion,  it  could 
be  taken  after  the  120- second  recorded  run.  Approximately  15  seconds  were 
allowed  before  each  run  for  the  subject  to  reach  stepiy-state  tracking  so 
that  the  first  100-second  portion  of  the  120-second  run  would  be  suitable 
to  use  for  describing  function  computations  and  performance  measures.  At 
the  completion  of  the  120- second  run,  the  pilot  was  asked  to  write  down  the 
deserved  ratings  on  his  clipboard  forms.  He  was  not  allowed  to  "play”  with 
the  configuration  because  his  ratings  would  then  be  based  on  characteristics 
other  than  those  specified  in  the  task  definition.  The  rating  scales  used 
and  the  task  definitions  are  given  in  the  next  subsection. 

The  second  task  of  the  sequence  was  the  determination  of  the  secondary 
loading  task  score.  This  was  a  multi-axis  task  where  the  primary  task 
was  still  pitch  tracking,  but  now  the  pilot  had  the  additional  task  of 
controlling  the  unstable  element  in  roll  as  discussed  in  Section  IV.B.I.c. 
This  very  difficult  combination  generally  consumed  about  2  minutes.  Thus 
the  total  8-minute  (approximately)  run  might  follow  the  sequence  shown  in 
Fig.  15. 

At  the  beginning  and  end  of  each  day  calibration  runs  were  made, 
and  a  series  of  secondary  task  (lateral)  alone  trials  were  made  to 
determine  the  critical  (maximum)  secondary  score  attainable  under 
no-load  conditions. 
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Record  opinion  on 
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i 


_  Warm  up  to  steady- state 


Figure  13.  Typical  Run  Sequence  for  Each  Experimental  Configuration 

a.  Measures  Obtained  During  Each  Run.  In  addition  to  ratings  obtained 
frcm  the  pilot,  a  large  amount  of  objective  data  were  taken.  Some  data 
were  tape  recorded  for  later  use  in  describing  function  calculations.  Strip 
chart  recordings  were  made  to  determine  various  performance  levels  attained 
while  tracking  and  to  possibly  contribute  clues  to  the  causes  of  the  result¬ 
ing  ratings .  A  digital  voltmeter  was  used  to  sequentially  read  out  numerous 
performance  measures.  The  variables  recorded  in  the  trials  are  given 
in  Table  XV.  One  of  the  variables  given,  the  EMG  signal,  is  perhaps  not 
self-explanatory.  The  EMG,  or  electranyograph,  was  utilized  in  the 
experiments  to  obtain  an  indication  of  neuromuscular  effort,  which  could 
then  be  correlated  with  pilot  rating.  Probes  were  attached  to  the  pilot's 
right  triceps  and  were  monitored  continuously  during  the  experiments.  The 
pre-experiment  calibrations  included  an  EMG  calibration  (stick  force  versus 
EMG  amplifier  output ) . 
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TABLE  XV 


RECORDED  MEASURES 


CHANNEL 

VARIABLE 

« 

w 

0 

0 

63 

w 

E 

d 

g 

BRUSH  STRIP  CHART 

1 

2 

3 

4 

5 

6 

7 

8 

9C ,  Pitch  Command  Input 

0e,  Primary  Task  Error 
&e.  Pilot's  Stick  Output 

0,  Pitch  Output 

Xs,  Secondary  Task  Score 
/e2dt,  Error  Performance 

Xs,  Secondary  Score  Rate  (see  Fig.  12) 

EMG/(0 .05s  +  1 ),  Muscle  Tension 

I 

9 

Timing  Signal,  Master  reference  timing  signal 

0 

« 

11 

EMG,  Unfiltered  myograph  signal 

12 

Step,  Identifies  120- second  portion  of  run 

1 

1 

/ 1  0C  |  dt ,  Performance  measure 

1 

2 

/ | 0e | dt ,  Performance  measure 

g 

3 

/ 1  &e  |  dt ,  Performance  measure 

g 

b 

/ | 0 | dt ,  Performance  measure 

y 

H 

5 

/EMG/(0 .05  s  +  1  )dt.  Performance  measure 

0 

C .  RATING  SCALES 

Using  the  phrases  for  which  scale  values  were  determined  in 
Section  III,  scales  were  constructed  to  solicit  opinion  frcm  the  pilot 
during  the  experiment.  A  ’'global"  scale  was  constructed  using  the 
degrees  of  goodness  of  handling  qualities.  Opinion  was  also  solicited 
about  the  specific  traits  of  "Response  Characteristics,"  "Control," 
"Demands  on  the  Pilot,"  and  "Effects  of  Deficiencies."  To  enable  a 
comparison  with  already  existing  scales,  Copper  ratings  (Ref.  1)  and 
Copper-Harper  ratings  (Ref.  3)  were  also  obtained.  The  number  of 
ratings  required  of  the  pilot  were  thus  considerable,  but  it  was 
found  that  the  3"  X  5"  cards  containing  the  scales  could  be  flipped 


through  quickly  when  the  pilot  became  familiar  with  them  (with  the 
exception  of  the  Cooper-Harper  scale,  which  was  presented  on  8-1/2"  Xll" 
paper,  as  shown).  The  scales  are  shown  in  Table  XVI,  where  the  number 
in  the  upper  right-hand  comer  of  each  box  represents  its  position  in 
the  sequence  of  presentation. 

Two  pilots  participated  in  the  experiments.  One  was  an  engineer- 
pilot,  the  other  a  pilot  fran  Aerospace  Test  Pilots'  School  at  Edwards 
AFB.  The  instructions  to  the  pilots  are  repeated  in  Appendix  C. 
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TABLE  XVI 


RATING  SCALES  USED  IN  EXPERIMENTS 


pilot  oniiioy  kasijr>  schedule 


formal 

operation 


bergency 

operation 


lo 

operation 


Adjective 

rating 


Satisfactory 


Unsatisfactory 


Unacceptable 


Numerical 

rating 


Description 

Primary 
mission 
accomplish vd 

Can  be 
landed 

Excellent,  includes  optimum 

Tes 

Yes 

Good,  pleasant  to  fly 

Yes 

Yes 

Satisfactory,  but  with  sane  mildly 

unpleasant  characteristics 

Yes 

Yes 

Acceptable,  but  with  unpleasant 

characteristics 

Yes 

Yes 

Unacceptable  for  normal  operation 

Doubtful 

Yes 

Acceptable  for  emergency  condition 

only1 

Doubtful 

Yc- 

Unacceptable  even  for  emergency 

fnditlon1 

No 

Doubtful 

Unacceptable  -  dangerous 

No 

No 

Unacceptable  -  uncontrollable 

No 

No 

TABLE  XVI  (Continued) 


Answer  the  following  questions  in 

order :  v  „ 

Yes  No 


1 .  Is  the  vehicle 
controllable 
during  the  task? 

2.  Is  the  vehicle 
acceptable  for 
the  task?  (May- 
have  deficiencies 
which  warrant 
improvement,  but 
is  adequate  for 
the  task. ) 

3.  ^.s  the  vehicle 
satisfactory  for 
the  task?  (i.e., 
adequate  for  the 
task  without 
improvement . ) 


□  □ 
□  □ 

□  □ 


HANDLING  QUALITIES  © 

0 

• 

1 

•  -Excellent 

2 

-Highly  desirable 

5 

k 

-Good,  pleasant 

5 

-Fair 

6 

7 

-Bad 

8 

-Very  bad 

9 

•■-Nearly  uncontrollable 

10 

Uncontrollable 
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TABLE  XVI  (Continued) 


RESPONSE  CHARACTERISTICS  ® 

f 

\  -  Excellent,  pure  (i.e.,  no 
I  accidental  excitation)  pri¬ 
mary  and  secondary  response 
I*  characteristics 


3  - 


Good,  relatively  pure,  pri¬ 
mary  and  secondary  response 
characteristics 


5 


o 

7 

8 
9 


(Fair,  somewhat  impure,  pri¬ 
mary  or  secondary  response 
characteristics 

(Quite  sensitive,  sluggish,  or 
uncomfortable  in  primary  or 
secondary  responses 
"^Extremely  sensitive,  sluggj  sh, 
‘  <  or  uncomfortable  in  primary 
(  or  secondary  responses 
.-  Nearly  uncontrollable 


10 1  |  Uncontrollable 


Not  applicable 


CONTROL  © 

Extremely  easy  to  control 
with  excellent  precision 


2 

3 

4 

5 


Very  easy  to  control  with 
good  precision 


Easy  to  control  with  fair 
precision 


6 

7 

8 


Controllable  with  somewhat 
inadequate  precision 
Controllable,  but  only  very 
imprecisely 
"Difficult  to  control 
■  Very  difficult  to  control 


9 


-  Nearly  uncontrollable 


10  □  Uncontrollable 


Not  applicable 


GO 


TABLE  XVI  (Continued) 


„  DEMANDS  ON  PILOT  © 

Or 

,i 


2  - 

-  Completely  undemanding, 

3  .  very  relaxed  and  comfortable 


4 

5 

6 

7 

8 

9 


Largely  undemanding,  relaxed 


_  Mildly  demanding  of  pilot 
attention,  skill,  or  effort 

Demanding  of  pilot  attention, 
skill,  or  effort 
Very  demanding  of  pilot  at- 

-  tent ion,  skill,  or  effort 
(Completely  demanding  of  pilot 

-/(  attention,  skill,  or  effort 

-  Nearly  uncontrollable 


10 


□ 

□ 


Uncontrollable 
Not  applicable 


EFFECTS  OF  DEFICIENCIES  © 

0 

• 

1 

r 

2 

1- 

3 

m 

4 

- 

5 

6 

:l 

I 

7 

o 

Effects  of  deficiencies  on 
<  performance  is  easily  com- 
(  pensated  for  by  pilot 
-  Moderately  objectionable 
deficiencies 

8 

-  Major,  very  objectionable 
deficiencies 

9 

.  -  Nearly  uncontrollable 

10 1 

i 

Uncontrollable 

Not  applicable 

6 1 


SUCTION  V 


ANALYSIS  OF  THE  LATA 

A.  INTRODUCTION  TO  THE  PILOT  MODEL 

The  correlations  which  will  be  made  in  this  section  will  include 
parameters  of  the  pilot  model,  so  a  very  brief  summary  of  the  model  is 
in  order  here.  A  complete  and  detailed  study  of  the  techniques  used  to 
derive  the  model  and  the  intricacies  of  parameter  adjustment  can  be  found 
in  McRuer  (5). 

The  simple  crossover  model  of  a  pilot/vehicle  combination  for  a  wide 
variety  of  controlled  elements  has  been  shown  to  be 


[YpYc(jo>)] 


03  =  03c 


^C  -\)TeU3 
jtl3 


(26) 


when  the  operator  is  performing  a  compensatory  tracking  task.  The  elements 
are  defined  as 

Yp  =  the  pilot  describing  function 

Yc  =  the  controlled  element  or  vehicle  transfer  function 

03c  =  the  system  crossover  frequency,  i.e.,  the  frequency 

where  | YpYc |  =  1 

Te  =  the  effective  time  delay,  i.e.,  the  high-frequency 
transport  lag  characteristics  observed  in  the  human 
operator  while  tracking.  Includes  neural  conduction 
time  delay,  cerebral  computation  times,  and  limb 
dynamics  time  constant. 

The  model  is  a  frequency  domain  description  of  the  open-loop  system  charac¬ 
teristics  in  the  region  of  crossover  and  in  the  presence  of  sinusoidal, 
random- appearing  inputs.  The  model  given  by  Eq.  26  describes  only  the  linear 
behavior  of  the  operator,  i.e.,  that  portion  of  the  system  output  which 
is  correlated  with  the  input.  An  operator  also  generates  an  output  that 
is  uncorrelated  with  the  input.  This  '’noisy'’  portion  is  called  the  remnant, 
and  is  defined  to  include  all  pilot  output  power  not  correlated  with  the 
input . 
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The  particular  values  of  the  two  parameters,  o>c  and  te,  which  would 
be  exhibited  in  a  manual  control  situation,  depend  on  numerous  factors, 
including  the  input  characteristics,  the  form  of  the  controlled  element, 
the  nature  of  the  environment  (e.g.,  fixed-  or  moving-base  simulation), 
motivation,  and  the  nature  of  the  task.  By  following  the  adjustment  rules 
given  by  McRuer  (5),  the  parameters  can  be  closely  estimated. 

A  pilot  model  corresponding  to  the  crossover  model  of  Eq.  2 6  for  the 
controlled  elements  used  in  this  study  is 


Kp 


(TLja>+  1) 
(Txj<u+l) 


e“Te<l“ 


(27) 


where  Kp  =  the  pilot  gain 

TL  ,  Tj  =  the  lead,  lag  equalization  time  constants 
generated  internally  by  the  pilot 

Note  that  in  order  for  the  crossover  model  of  Eq.  2 6  to  correctly  describe 
the  total  open- loop  system,  the  pilot  must  exactly  cancel  any  lead  or  lag 
in  the  controlled  element  near  the  crossover  region.  Available  data 
indicates  that  he  is  able  to  do  so,  except  when  the  controlled  element  is 
the  "critical  task"  of  Jex  (22).  There  he  is  constrained  to  a  behavior 
which  causes  the  pilot  to  appear  nearly  as  a  gain  with  a  transport  lag, 
so  that  the  total  open-loop  does  not  have  the  usual  form  of  Eq.  26. 
Equations  26  and  27  will  be  used  to  fit  the  data  of  this  study. 


B.  VALID  ITT  QT  USE  DESCRIBING  FUNCTION  DATA 


1 .  Computational  Approach 


Describing  functions  of  the  pilot  and  of  the  total  open-loop  were  com¬ 
puted  using  a  digital  routine  (BOMM,  Ref.  35)  which  determined  the  ratios 
of  the  Fourier  coefficients  of  the  appropriate  time  series.  Some  spectral 
densities  and  statistical  measures  were  also  computed.  The  describing 
functions  of  interest  are  given  by 


(28) 


6k 


and 


(29) 


where  6ei,  ei,  and  9^  are  the  Fourier  coefficients  at  the  ith  frequency 
for  the  elevator  deflection  (pilot  output),  system  error  (pilot  input), 
and  system  output. 

An  example  of  the  results  of  the  routine  is  shown  in  Figs.  14  and  15 
for  Yc  =  K/s.  The  plot  of  4>ee  and  <J>B5  (power  spectra  of  e  and  6)  shows 
the  difference  between  coherent  power  (at  input  frequencies)  and  uncor¬ 
related  power  (or  noise)  by  denoting  the  coherent  power  with  the  circular 
symbol.  Thus  it  can  be  seen  that  the  signal-to-noise  ratio  was  a  problem 
at  the  lower  input  frequency  of  ®ee.  Because  the  signal  at  that  frequency 
was  obviously  contaminated  with  noise,  the  corresponding  describing  func¬ 
tion  points  were  marked  "unreliable"  with  a  flag  in  Fig.  15.  All  of  the 
describing  functions  of  the  experiment  were  treated  in  a  similar  manner 
and  are  included  in  Appendix  C.  Generally,  the  lower  three  to  five  fre¬ 
quency  points  were  found  tc  be  unreliable. 

The  mid-  and  high-frequency  describing  function  data  appear  to  have 
been  calculated  from  high  quality  (noise-free)  experimental  data.  It  is 
these  data  that  should  be  compared  between  controlled  element  forms  for 
internal  consistency,  and  with  the  previous  work  of  McRuer  (5)  for 
compatibility. 

The  describing  function  data  is  included  in  Appendix  C  along  with  a 
tabulation  of  the  fitted  parameters  and  rating  data.  Because  of  the  eco¬ 
nomics  involved,  describing  function  data  could  be  computed  only  for  the 
single-loop  runs  of  JDM.  The  rating  data  for  pilot  MDK,  however,  is 
included  in  Appendix  C. 

2.  Compatibility  of  Effective  Time  Daisy,  Input, 

Croeeover  Frequency,  and  Fhaae  Margin  Effects 

The  points  selected  in  the  experiment  for  comparison  with  past  work 
were,  from  Table  XIII,  Yc  =  Kg/s  and  Kg/s2,  where  Kg  is  the  "best"  gain 
as  determined  in  a  brief  preliminary  trial.  Several  input  combinations 
were  used  to  allow  a  check  of  variation  with  input  bandvridth.  Figure  16 
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shows  the  effects  of  on  xt  at  the  intermediate  input  amplitude  of 
1  cm  rms.  Plotted  also  are  the  comparable  curves  from  McRuer  (5).  It. 
is  seen  that  the  trends  are  consistent,  i.e.,  that  although  the  slopes 
are  different,  the  values  of  xe  are  very  nearly  the  same.  The  difference 
in  slopes  could  be  accounted  for  by  the  differences  in  the  control  axis 
and  stick  (pitch/center-stick  versus  roll/side-stick),  but  the  differences 
are  considered  small  enough  to  show  that  the  Xq  trend  is  compatible  and 
consistent. 

Further  checks  are  provided  by  crossover  frequency  and  phase  margin 
trends  with  input  bandwidth.  Figure  17  shows  a  comparison  of  crossover 
frequency  trends.  The  agreement  with  Ref.  5  i-s  regression 

phenomenon  can  be  seen  for  the  acceleration  ccmmand  element,  i.e.,  the 
operator  actually  reduces  the  error  magnitude  by  reducing  his  gain  and 
bandwidth  slightly  when  the  input  bandwidth  is  large.  Figure  l8  shows  a 
comparison  of  phase  margins.  The  agreement  is  excellent  for  K/s2,  but 
rather  poor  for  the  K/s  elements.  Since  the  high-frequency  phase  is  so 
sensitive  to  the  re  curve  fit,  the  comparison  does  not  indicate 
countertrend  and  is  thus  considered  inconclusive. 

An  interesting  alternative  way  to  look  at  the  re  data  exists  which 
should  prove  useful  in  the  estimation  of  re.  McRuer  (5)  shows  a  dependence 
of  Te  on  the  form  of  the  controlled  element  as  well  as  the  input  bandwidth. 
The  Te  seemed  to  depend  on  the  equalization  generated  internally  by  the 
pilot.  A  concise  method  of  portraying  the  equalization  can  be  obtained 
by  defining  a  parameter  which  is  sensitive  to  both  lead  and  lag.  One 
such  parameter  is  the  slope  of  the  pilot's  amplitude  ratio  at  the  cross¬ 
over  frequency,  where  the  choice  of  crossover  frequency  reflects  that  the 
pilot  is  most  sensitive  to  characteristics  at  crossover  during  tracking 
[see,  for  example,  McDonnell  (27)  or  Ashkenas  (21)].  Data  were  assembled 
from  this  study  and  from  McRuer  (5)  to  test  such  a  parameter.  Figure  19 
shows  the  effective  time  delay  for  several  controlled  elements  plotted  as 
a  function  of  L,  where 

L  =  i  (Slope  of  I ^pl dB/decade)  (50) 

'  '  '0=0)0 
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wj  (rad/sec) 


Figure  17*  Variation  of  Crossover  Frequency  with  input  Bandwidth 


Figure  18  •  Variation  of  Phase  Margin  with  Input  Bandwidth 


A  Kb/s(s*  4) 

X  Kb/[s2  ♦  2(.7)(l6)s  ♦  (I6)2] 
O  K 


_ I _ 

0 

L,  Crossover  Lead  Required  ,  Units 


Figure  19  .  Variation  of  Effective  Time  Delay 
with  Slope  of  Pilot  Equalization 


Thus,  for  a  Yc  =  K/s,  no  lead  would  be  required  and  L  =  0.  For 

Yc  =  K/s2,  lead  is  generated  at  a  very  low  frequency,  so  L  =  1 .  It  is 

apparent  that  a  remarkable  pair  of  curves  results  whir'll  is  a  function 
of  cui,  the  input  bandwidth.  Such  a  family  has  considerable  potential  as 

an  aid  to  estimate  Te.  The  limited  amount  of  data  shown  in  Fig.  19  also 

further  demonstrates  compatibility  with  past  work. 

J.  The  Relationship  Between  the  "Best"  Gains 

It  was  hypothesized  by  McDonnell  (27)  that,  for  a  given  tracking  task, 
the  selection  of  the  "best"  gain  for  the  controlled  element  is  based  on 
the  amplitude  ratio  of  the  element  at  crossover,  i.e.,  where  |YpYc|  =  1. 
Thus,  if  the  gain  at  crossover  for  one  form  of  controlled  element  is  known, 
the  gains  for  other  forms  should  be  estimable  by  setting  crossover  ampli¬ 
tudes  equal.  Since  the  best  gains  were  determined  by  pilot  JDM  for 
several  forms,  the  hypothesis  can  be  checked  with  the  data  of  this  study. 
Table  XVII  lists  the  data  necessary  to  compute  crossover  gains  together 
with  the  computed  values.  If  the  gain  of  the  subcritical  task  is  excluded 


TABLE  XVII 

AMPLITUDE  OF  THE  CONTROLLED  ELEMENT  AT  CROSSOVER 


Yc/KB 

u>c(  rad/sec) 

kB 

mm 

1/s 

4.0 

0.986 

-16.7 

l/s(s+4) 

4.0 

2.15 

-20.5 

l/s(s  +  2) 

4.0 

2.15 

-18.4 

l/s(s  +  1 ) 

3.4 

2.15 

-15.0 

1/s2 

4.0 

1.17 

-22.9 

V  (s-2) 

4.7 

3.45 

-3.4 

l/[s2  +  2x  0.7  x7.8s  +7. 82] 

4.5 

8.38 

-13.2 

l/[s2  v  2x  0.7  x  16s  +  1 62] 

3.1 

35.2 

-16.9 
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(it  obviously  is  not  comparable  with  the  others),  the  mean  of  the  gains, 
in  dB,  is  -18  dB,  with  extremes  of  ±4.8  dB.  This  is  regarded  as  reasonable 
support  for  the  hypothesis  since  the  gains  which  could  be  selected  were  in 
discrete  steps  allowing  an  uncertainty  of  approximately  ±50  percent  in  the 
final  setting.  At  the  very  worst,  close  estimates  to  the  best  gain  can  be 
made  if  the  crossover  model  is  valid.  Connections  between  the  hypothesis 
and  the  subcritical  task  gain  are  not  known  at  this  time. 

C-  CORRELATIONS  07  PILOT  RATING  WTZE  THE 
EXPERIMENTALLY  DETERMINED  PARAMETERS 

Approximately  50  compensatory  pitch  tracking  runs  were  made  by  each 
of  the  two  pilots.  Since  eight  different  rating  scales  were  used  by  the 
pilot  for  each  run  (see  Table  XVI),  and  approximately  a  dozen  parameters 
were  measured  during  a  run,  a  selective  correlation  will  have  to  be  made 
for  reasons  of  economy.  Because  of  the  wide  familiarity  with  the  Cooper 
rating,  it  will  be  used  to  make  the  initial  correlations  with  system  and 
pilot  parameters.  Correlations  between  ratings  can  then  be  made  to  test 
the  selectivity  and  sensitivity  of  the  individual  trait  ratings.  Any 
special  trends  which  look  promising  can  then  be  brought  out  explicitly 
by  returning  to  a  correlation  of  the  individual  rating  scale  with  the 
system  parameter  of  interest.  The  number  of  cross  plots  can  thus  be  kept 
to  a  minimum  without  running  the  risk  of  missing  key  trends. 

Sane  of  the  data  presented  will  be  redundant  because  of  the  functional 
dependence  of  several  parameters.  Thus,  for  example,  plots  of  pilot  gain 
and  controlled  element  gain  versus  ratings  would  be  identical  because  the 
adaptive  nature  of  the  operator  results  in  KpKc  =  o>c  =  constant.  Never¬ 
theless,  since  we  are  looking  for  consistency  and  the  widest  applicability 
possible,  all  pertinent  parameters  will  be  considered. 

1 .  Correlation  of  Pilot  Rating  with  Pilot  Parameter* 

a.  Variation  of  Pilot  Rating  with  Gain.  The  operator  is  capable  of 
adapting  over  a  very  large  dynamic  gain  range  with  little  change  in  per¬ 
formance,  so  the  pilot's  opinion  of  various  gains  is  of  extreme  importance. 
Figure  20  shows  the  results  of  a  dynamic  range  of  TOO.  A  preliminary  set 
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of  trials  was  carried  out  to  determine  Kg.  The  gain  was  then  varied 
between  0.1  Kg  and  10Kg.  Since  KpKc  =  =  constant  for  a  given  Yc  form, 

either  Kc  or  Kp  can  be  plotted  to  show  the  desired  trends.  The  selected 
parameter  for  Fig.  20  was  the  ratio  of  the  controlled  element  gain  to  the 
previously  determined  ’’best"  controlled  element  gain.  The  resulting 
trends  in  Fig.  20a  show  the  expected  dome  shapes.  It  is  interesting  to 
note  that  there  appears  to  be  a  "family  compatibility"  with  all  but  the 
second-order  complex  pair  element  results.  The  opinion  trend  for  all 
elements  seems  to  deteriorate  more  quickly  for  the  high  controlled  element 
gains.  The  comment  was  made  during  the  series  that  the  results  of  an 
inadvertent  stick  motion  with  the  high  element  gain  was  considerably  more 
disagreeable  for  all  controlled  elements  than  the  large  stick  displace¬ 
ments  (and  forces)  necessary  with  the  low  element  gains.  On  the  other 
hand,  when  a  low  element  gain  was  used  with  the  complex  pair,  extremely 
large  stick  forces  had  to  be  held,  whereas  with  the  other  forms  the  large 
input  excursions  could  be  integrated  out.  Hie  rapid  deterioration  of 
opinion  for  the  complex  pair  element  is  therefore  quite  reasonable. 

Figure  20b  shows  the  Cooper  ratings  obtained  from  the  other  pilot, 

MDK  for  gain  variations.  As  mentioned  earlier,  describing  functions 
could  not  be  computed  for  MDK  because  of  the  limited  funds  available. 
However,  the  rating  data  shown  give  us  a  hint  as  to  the  kind  of  problems 
introduced  by  pilot  "set."  MDK  was  obviously  a  much  less  sensitive 
rater,  i.e.,  he  was  reluctant  to  make  fine  distinctions  between  configu¬ 
rations.  His  comments  indicated  that  he  preferred  to  base  his  ratings  on 
the  category  descriptors  as  much  as  possible  because  the  finer  distinc¬ 
tions  were  not  clear.  No  further  implications  can  at  present  be  drawn 
from  these  data,  but  they  are  included  so  that  a  data  base  will  be  started 
for  future  studies  of  pilot  set.  Additional  MDK  data  are  included  on 
"trait"  ratings  in  a  subsequent  subsection. 

b.  Variation  of  Ratings  with  Effective  Time  Delay.  Pilot  parameters, 
including  the  effective  time  delay,  were  read  from  curve  fits  of  the  des¬ 
cribing  function  data.  A  tabulation  of  the  parameters,  as  well  as  the 
curve  fits  themselves,  are  shown  in  Appendix  C.  Figure  21  shows  the 
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Figure  20.  The  Variation  of  Pilot  Rating 
with  Controlled  Element  Gain  for  Two  Pilots 


effects  on  opinion  of  the  effective  time  delay.  Hie  gains  for  all  ele¬ 
ments  were  optimum.  The  variation  looks  quite  linear  with  the  exception 
of  the  subcritical  task,  which  contains  a  nonminimum  phase  pole.  Recall 
that  it  is  the  subcritical  task  which  cannot  be  fitted  with  the  simple 
crossover  model  with  any  great  success,  and  which  has  a  constraining 
effect  on  the  pilot. 

The  effective  time  delay  is  affected  by  two  factors.  The  effect  of 
the  equalization  generated  by  the  pilot  on  the  re  is  shown  by  the  solid 
line  in  Fig.  21.  Input  bandwidth  effects  are  shown  by  the  dashed  line. 

The  carpet  plot  of  Fig.  21  sums  up  the  relation  between  both  equalization 
required  and  input  bandwidth  quite  neatly,  so  that  if  used  in  conjunction 
with  Fig.  19,  estimates  of  ratings  snould  be  improved. 

c.  Variation  with  Equalization.  Prior  to  the  experiments  of  this 
study,  very  little  data  existed  where  lead  was  measured  at  the  same  time 
the  ratings  were  taken.  Thus  the  majority  of  the  connections  between 
lead  and  ratings  were  inferred  [see,  for  example,  Ashkenas  (21)].  A 
compounding  problem  was  the  uncertainty  about  the  lead  placement.  It 
has  been  assumed  in  most  recent  work  that  the  pilot  exactly  canceled 
controlled  element  lag  with  his  lead  generation  over  an  approximate  range 
of  0.1  <  Tl  <  5  sec.  Ulus  lead  equalization  relationships  with  pilot 
ratings  have  been  abundant,  but  also  questionable. 

The  data  points  of  Fig.  22  overcame  the  two  shortcomings  noted  above. 
Best  gains  were  used  on  all  configurations,  and  the  bandwidth  and  amplitude 
of  the  input  was  held  fixed.  Scrutiny  of  the  describing  function  data  in 
Appendix  C  will  reveal  that  |  YpYc  |  does  indeed  look  like  K/s  over  the  fre¬ 
quencies  where  the  lag  occurs,  indicating  that  the  pilot  does  cancel  the 
lag  with  his  lead.  It  was  necessary  to  infer  only  one  lead  value  —  that 
for  Yc  =  K/s2.  It  has  been  shown  in  McRuer  (5)  that  in  that  case 
Tl  =  5  sec,  which  is  below  the  lowest  frequency  that  can  be  resolved 
with  one  or  two  runs. 

A  comparison  of  the  rating  data  with  previous  data  [for  example, 
Ashkenas  (21)]  shows  that  the  difference  in  ratings  between  K/s2  and 
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Figure  21 .  Variation  of  Pilot  Rating 
with  Effective  Time  Delay 
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Figure  22.  Variation  of  Pilot  Rating  with  Pilot  Lead 

K/s(s+  1 )  is  not  as  large  in  the  current  series  as  would  be  expected.  A 
difference  of  only  one  Cooper  unit  was  obtained  here  as  compared  with  5-^ 
units  elsewhere.  This  compatibility  problem  cannot  be  resolved  because 
of  the  already  mentioned  uncertainties  in  the  older  data,  together  with 
a  lack  of  documentation  regarding  task,  control  stick,  motivation,  simu¬ 
lator  quality,  etc.  For  example,  opinion  is  thought  to  be  very  sensitive 
to  crossover  frequency  when  the  lead  is  nsar  crossover.  Thus,  in  the 
current  series,  a  wider  difference  in  ratings  would  probably  have  resulted 
if  the  pilot  had  lowered  his  gain  slightly. 

2.  Correlation  of  Ratings  with  Closed-Loop  Parsmeters 

There  are  myriad  closed-loop  parameters  which  could  be  computed,  but 
perhaps  three  are  of  significance  in  identifying  trends.  A  measure  of 
the  "tightness"  of  the  loop  closure  is  provided  by  the  crossover  frequency, 
and  we  have  previously  maintained  that  it  remains  essentially  invariant 
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with  gain  changes  (see  p.  73).  It  would  therefore  be  instructive  to  check 
it.  Stability  margins  and  performance  are  also  of  interest.  Since  phase 
margin  is  used  almost  universally,  it  is  appropriate  to  use  it  here. 
Finally,  performance  could  conceivably  influence  pilot  ratings  to  a  high 
degree,  hence  an  error  measure  will  be  computed  and  checked. 

a.  Cro««ovtr  Praguency  Erandi.  It  was  shown  in  Fig.  17  that  the 
crossover  frequency,  cnc,  is  essentially  invariant  with  input  bandwidth, 

0)*.  Checks  of  cdc  as  a  function  of  gain  are  also  available  for  K/s  and 
K/s2.  Shown  in  Fig.  23  are  the  crossover  frequencies  for  several  gains 
(the  0.1  Kg/s^  describing  function  calculations  had  an  extremely  poor 
signal-to-noise  ratio,  hence  oic  was  not  available  for  it).  The  change  in 
cnc  due  to  gain  is  seen  to  be  about  1  rad/sec  over  a  dynamic  range  of  100. 
With  such  a  small  variation,  it  is  a  foregone  conclusion  that  a  correla¬ 
tion  between  ratings  and  u)c  would  be  poor. 

b.  Correlation  of  Fhaat  Martin  and  Rating! .  The  phase  margins  for 
the  best  gain  configurations  are  plotted  in  Fig.  24.  With  the  exception 
of  the  subcritical  task„  the  ratings  vary  fairly  linearly  with  phase  mar¬ 
gin.  It  could  be  argued  that  the  pilot  is  downgrading  the  configurations 
because  of  his  increasing  discomfort  with  the  lowering  stability  margins. 
It  could  also  be  argued  that  the  ''cause"  is  the  requirement  to  equalize. 
Since  pilot  comments  were  of  no  help,  it  is  pointless  to  speculate  about 
cause  and  effect.  However,  the  phase  margin  can  be  written  as 

%1  =  -f-  “  (31  ) 

We  have  seen  that  crossover  frequency  is  approximately  constant  as  a 
function  of  gain,  and  that  a  small  incremental  difference  exists  between 
forms,  so  we  would  expect  that  cpm  will  vary  inversely  as  Te .  A  comparison 
of  Figs.  21  and  24  shows  that  to  be  the  case. 

c.  Performance  and  Rating!.  The  pilots  were  instructed  to  rate  the 
configurations  in  the  context  of  the  task,  where  the  task  specification 
included  a  performance  error  specification.  The  resulting  objective 
measures  of  perfomance  should  thus  be  interesting  to  compare  with  the 
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Figure  23*  Variation  of  Crossover  Frequency  with  Gain 


Figure  2b.  Variation  of  Cooper  Ratirg  with  Phase  Margin 
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ratings.  Performance  was  computed,  by  measuring  the  average  absolute 
value  of  the  system  error,  i.e., 


Fig’ire  25  presents  further  evidence  that  the  crossover  characteristics 
stay  approximately  constant  as  a  function  of  gain.  Performance,  then, 
gives  no  indication  of  the  rating  changes  due  to  gain  for  a  given  form. 


O  K/s 
O  K/s2 

A  K/[s*+2(7XI6)s+(l6)*] 
JDM:B6"- 1.88-1 


.1  .5  1  5  10 


K/K„ 


Figure  25*  Performance  Variation  with  Gain 

On  the  other  hand,  Fig.  2 6  shows  that  there  is  a  direct  correlation  of 
performance  and  ratings  between  the  "best"  gain  configurations  of  several 
controlled  element  forms.  Shown  in  the  figure  are  four  data  points  for 
input  bandwidths  other  than  1.88  rad/sec.  The  correlation  for  the  low 
ratings  is  seen  to  be  quite  good. 

If  the  pilot  is  really  rating  partly  on  performance,  a  look  at  the 
actual  magnitude  of  the  error  could  prove  interesting.  Figure  27  shows 
the  absolute  value  of  the  system  error  averaged  over  the  100  sec  run 
length  for  K/s  and  K/s2,  and  with  the  three  input  levels.  The  correlation 
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Figure  26-  Correlation  of  Rating  with  Performance 

is  excellent.  The  regression  line  is  identical  to  the  line  in  Fig.  26, 
but  for  the  sake  of  clarity  the  two  figures  have  been  kept  separate. 

Shown  also  in  Fig.  27  is  the  performance  criterion  value  specified  in 
the  task  definition.  For  this  particular  pilot,  the  intersection  seems 
to  be  at  about  the  three  level  on  the  Cooper  scale. 

d  Connection!  Between  Remnant  and  Rating!.  The  pilot's  sti^k  output 
power  can  be  considered  to  be  the  sum  of  the  power  which  is  correlated 
with  the  system  input  (the  linear  portion)  and  the  uncorrelated  power,  or 
noise,  which  is  by  definition  the  remnant.  The  relative  remnant,  p^.,  is 
defined  as  the  ratio  of  the  correlated  power  to  the  total  power,  or 
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Figure  27.  Influence  of  Error  Magnitude  on  Ratings 

Thus,  when  the  operator  is  introducing  only  a  small  amount  of  noise, 
either  through  nonlinearities,  time  variations,  or  noise  injection,  the 
p^,  will  be  nearly  unity.  When  the  operator's  output  is  all  noise,  the 
will  be  zero.  Since  the  amount  of  remnant  in  the  system  could  have 
a  significant  effect  on  pilot  ratings,  the  relative  remnant  was  computed 
simultaneously  with  the  describing  functions. 

The  variation  of  the  relative  remnant  was  investigated  as  a  function 
of  four  key  parameters:  the  controlled  element  gain;  the  effective  time 
delay  which,  it  will  be  recalled,  reflects  the  equalization  required  of 
the  pilot;  the  amplitude  of  the  system  input;  and  the  bandwidth  of  the 
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system  input.  Figures  28a  and  b  show  the  effects  of  controlled  element 
gain  on  the  remnant,  and  the  correlation  of  rating  with  the  remnant.  The 
trend  of  p^.  with  gain  demonstrates  that  the  pilot  performs  more  linearly 
with  larger  stick  excursions  when  the  element  is  a  K/s,  but  that  his  per~ 
formance  with  a  K/s2  is  approximately  one-half  noise  and  is  little  affected 
by  gain.  The  corresponding  rating  results,  Fig.  28b,  show  little  correla¬ 
tion  with  remnant,  indicating  that  the  remnant  variation  with  gain  is 
probably  not  a  primary  causal  factor  of  the  rating  variations. 

The  relation  of  remnant  and  Te  is  the  most  interesting  of  the  quartet. 
The  configurations  all  have  best  gains  and  the  same  input,  so  only  the  form 
differences  are  influencing  the  remnant.  The  straight  line  shown  in  Fig.  29 
fits  the  data  reasonably  well,  with  the  exception  of  the  subcritical  task 
point.  It  will  be  remembered  that,  this  is  the  case  which  is  not  adequately 
described  by  the  crossover  model.  Thus  it  could  be  argued  that  a  measure 
of  task  difficulty,  at  least  for  a  comparison  of  different  forms,  is  given 
by  the  relative  remnant.  It  is  felt,  however,  that  Te  is  considerably  more 
direct  and  can  be  estimated,  so  is  the  more  desirable  of  the  two  measures 
to  apply  to  the  rating  problem. 

The  effects  of  the  input  are  shown  in  Figs.  JO  and  31.  It  is  apparent 
that  no  direct  or  significant  correlations  exist,  which  leads  to  the  con¬ 
clusion  that  it  is  effects  of  the  input  on  other  parameters  (namely,  the 
Are  and  performance,  as  we  have  seen)  that  causes  the  deterioration  in 
rating . 

The  remnant  data  presented  in  Figs.  28  through  31  are  consistent  with 
McRuer's  (5)  data,  with  the  possible  exception  of  the  variation  with  gain 
for  K/s2.  McRuer  found  a  definite  decrease  in  with  increasing  gain, 
while  this  study  notes  a  slight  increase  in  p^.  The  data  has  been 
carefully  checked,  so  the  discrepancy  must  remain  unexplained. 

3.  Correlation  of  Rating*  with  the  Environment 

It  has  been  emphasized  several  times  to  this  point  that  the  configura¬ 
tion  must  be  rated  in  the  context  of  the  task  in  order  for  the  ratings  to 
be  valid  indicators  of  the  vehicle  suitability  for  the  task.  We  would 
thus  expect  ratings  to  be  dependent  on  the  environment,  or  system  input 
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Figure  28.  Variation  of  Ratings  with  Remnant  as  a  Function  of  Gain 
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Figure  29.  Variation  of  Ratings  with  Remnant  as  a  Function 

of  Controlled  Element  Form 
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Figure  30.  Variation  of  Rating  with  Remnant  as  a 

Function 

of  Input  Amplitude 
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Figure  31.  Variation  of  Ratings  with  Remnant 
as  a  Function  of  Input  Bandwidth 
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in  our  case,  as  well  as  the  configuration  and  task  specification.  Results 
supporting  this  contention  have  already  been  noted,  where  we  have  seen 
changes  in  rating  as  a  function  of  Te>  for  example,  which  can  be,  in  turn, 
almost  totally  dependent  on  the  input  bandwidth  (see  the  dashed  lines  in 
Fig.  23).  Here  we  shall  plot  the  input  effects  directly,  which  is  just 
an  alternate  way  of  looking  at  the  data. 

The  data  shows,  in  Fig.  32,  that  for  small  amplitude  inputs  the  band¬ 
width  must  be  increased  to  fairly  large  values  before  the  pilot  is  appre¬ 
ciably  affected.  As  the  amplitude  is  increased,  however,  the  pilot  becomes 
very  sensitive  to  bandwidth.  This  phenomenon  could  be  a  manifestation  of 
•  he  indifference  threshold  discussed  in  McRuer  (5).  When  a  good  deal  of 
lead  is  being  generated,  as  with  the  K/s2,  an  increase  of  04  from  1.88  to 
2.89  rad/sec  caused  an  increment  in  ratings  of  2.3  to  3  units. 


u*l  (rad/sec) 


Figure  32.  The  Effect  of  Input  Character! sties  on  Ratings 
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4.  Correlation  of  Rating*  with  Secondary  Ta*k  Score 

As  detailed  in  Section  IV,  a  secondary  loading  task,  in  the  form  of 
an  unstable  roll  tracking  task,  was  utilized  as  a  measure  of  pilot  atten¬ 
tion  required  to  maintain  primary  task  performance,  or  the  "excess  capacity" 
the  pilot  has  for  performing  other  tasks  while  maintaining  primary  per¬ 
formance.  The  scores  obtained  from  the  cross-coupled  secondary  task  rep¬ 
resent  its  degree  of  difficulty;  consequently,  they  also  represent  the 
"degree  of  ease"  of  the  primary  task. 

Secondary  scores  were  obtained  for  all  configurations  and  inputs,  and 
have  been  correlated  with  ratings  in  various  ways.  Figure  33  shows  how 
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Figure  33.  Secondary  Task  Score  Variation 
with  Ratings  for  Best-Gain  Configurations 
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the  scores  for  the  best  gain  configurations  of  each  form  compare  with  the 
Cooper  ratings.  The  agreement  is  extremely  good.  Even  the  subcritical 
task,  which  has  been  a  notable  culprit  in  other  correlations,  seems  to  fit 
in  linearly  with  the  other  data.  Recall  that  a  Xs  =  0  corresponds  to 
100  percent  of  the  pilot’s  attention  being  focused  on  the  primary  te.sk, 
while  a  Xs  =  5«5  means  that  no  attenLion  is  required  to  maintain  primary 
performance . 

The  effects  of  gain  variation  are  shown  in  Fig.  Here  again,  the 

correlation  is  remarkable.  The  data  point  for  Yc  =  0.5  Kg/s2  is  considered 
either  to  have  been  rated  incorrectly  or  set  up  incorrectly  on  the  computer, 
since  the  rating  assigned  was  considerably  better  than  the  "best"  rating, 
i.e.,  the  rating  for  Kg/s2. 
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Figure  3^.  Variation  of  Secondary  Task  Score 
with  Controlled  Element  Gain 
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The  effects  of  changing  the  input  parameters  are  seen  in  Fig.  35* 
The  scatter  has  increased  somewhat,  but  agreement  is  still  good.  The 
entire  experiment  has  been  plotted  for  subject  JDM  in  Fig.  36.  Of  the 
45  configurations,  73  percent  are  within  one  Cooper  rating  of  the 
regression  line. 
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Figure  36.  Secondary  Task  Scores  for  All 
Configurations  and  Inputs 

J.  Correlation  of  Bating!  with  Keurcmuecular  Tsnalon 

Pant  experience  [McDonnell  (89),  McRuer  (5)]  with  difficult  tasks  has 
Indicated  that  in  many  cases  the  pilot  becomes  extremely  tense,  that  Is, 
e^ibits  a  high  degree  of  neuronuscular  tension.  It  was  hypothesized  that 
this  tension,  or  effort,  would  be  a  chief  determiner  of  pilot  rating, 
since  "effort"  or  "work"  invariably  cones  up  in  any  discussion  of  handling 
qualities  ratings,  Thus  the  pilots  were  instrumented  with  electronyograph 
(EMC)  probes  to  attempt  to  measure  such  a  parameter.  The  most  sensitive 
area  on  the  am  was  determined  to  be  the  triceps,  where  electrodes  were 
attached  and  monitored  during  the  runs. 
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Corresponding  Average  Pushing  Force  at  Grip  (lbs) 


The  results  showed  that  neuromuscular  activity  increased  only  as 
controlled  element  gain  decreased,  as  would  be  expected.  Average  tension 
level,  as  a  function  of  element  form  (and  consequently  as  a  function  of 
te),  appears  indeterminate,  as  is  shown  in  Fig.  37.  Especially  surprising 
was  the  relatively  low  value  for  the  subcritical  task,  which  was  expected 
to  be  the  largest  in  view  of  subjective  comments  made  during  other  experi¬ 
ments  (McDonnell,  Ref.  29).  It  is  apparent  that  the  average  tension  level 
is  perhaps  a  less  reliable  indicator  of  limb  activity  than  measures  of 
external  performance,  such  as  average  stick  motion,  while  its  significance 
as  a  measure  of  pilot  rating  in  terms  of  internal  effort  is  doubtful.  It 


Figure  37.  Average  Tricep  Tension  While  Tracking 
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is  concluded  from  the  data  that  the  average  internal  tension  is  not  a 
primary  causal  factor  in  pilot  ratings. 

6.  Comparison  of  Cooper  and  Cornell  Ratings 

A  limited  amount  of  rating  data,  heretofore  unpublished,  was  taken  in 
1963  as  part  of  a  large  program  (McRuer,  Ref.  5)»  The  pilot  used  and  com¬ 
pared  the  Cooper  scale  and  the  Cornell  scale  for  two  configurations, 

Yc  =  K/s  and  K/s(s— o^).  It  would  be  of  interest  to  compare  those  data 
with  the  results  of  the  current  series,  'ihe  task  carried  out  was  compen¬ 
satory  tracking,  where  a  laterally  moving  dot  was  controlled  with  a  roll 
side  stick.  The  pilot  interpreted  an  inch  of  lateral  dot  displacement  as 
30  deg  of  bank  angle.  The  criterion,  or  performance  required  for  the 
task,  is  not  clear  quantitatively,  but  the  pilot  considered  the  task  to 
be  approximately  straight  and  level  cruising  flight.  It  is  interesting 
to  note  that  the  pilot  felt  that  he  had  to  maneuver  the  configuration  in 
an  open-loop  fashion  without  an  input  in  addition  to  the  compensatory 
tracking  before  he  would  give  a  rating.  Thus,  in  terms  of  the  structure 
evolved  in  Section  II  of  this  report,  he  was  rating  on  an  undefined  com¬ 
bination  of  tasks. 

The  plotted  Cooper  rating  data  of  Figs.  38  and  39  is  taken  from 
Table  D-I  in  Appendix  D.  The  Cornell  ratings  shown  in  the  figures  are 
not  tabulated.  Comments  made  by  the  pilot  indicated  that  he  felt  the  two 
scales  were  identical  at  the  good  end  and  were  approximately  a  point  dif¬ 
ferent  at  the  bad  end,  with  the  Cornell  rating  being  the  larger  of  the  two. 
Figure  39  reflects  the  point  difference  between  the  scales  in  the  6  to  10 
region.  No  comments  were  made  about  the  midranges,  but  Fig.  38  shows  that 
the  difference  between  the  scales  there  increases  somewhat  linearly  with 
the  ratings. 

An  interesting  observation  on  variability:  Fig.  38  shows  a  marked 
increase  in  scatter  for  the  poorer  ratings,  thus  supporting  our  earlier 
findings  regarding  the  sensitivity  of  the  rating  scales. 

A  comparison  between  the  earlier  data  and  the  ratings  obtained  in 
this  study  was  made  by  normalizing  the  gain  of  the  earlier  controlled 
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Pilot  Rating  2  Pilot  Rating 


jure  38. 


A  Comparison  of  Cooper  and  Cornell  Ratings  for  Y0  =  K/s 


Figure  39.  A  Comparison  of  Cooper  and  Cornell  Ratings 
for  an  Unstable  Second-Order  Controlled  Element 
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element,,  The  differences,  shown  in  Fig.  40,  are  quite  dramatic.  A 
plausible  explanation  is  the  difference  in  tasks.  As  noted  earlier, 

RH  was  rating  on  the  basis  of  a  qualitative  cruise-like  condition,  and 
based  his  ratings  in  part  on  open-loop,  no- input  characteristics. 
Although  the  differences  are  not  conclusively  due  to  task  definition, 
the  importance  of  making  a  complete  and  concise  specification  of  the 
task  can  be  appreciated. 


Figure  4o.  A  Comparison  of  Cooper  Ratings 
for  Two  Tasks  and  Two  Pilots 


D.  CONNECTIONS  BETWEEN  experimentally  measured  ratings 

In  addition  to  the  many  parameters  obtained  from  the  de scribing 
functions,  several  ratings  were  taken  for  each  configuration.  The  scales 
selected  are  given  in  Section  IV. C,  and  included  the  Cooper  scale,  the 
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revised  Cooper  scale,  a  "Handling  Qualities"  scale,  and  four  "trait" 
rating  scales.  The  "Handling  Qualities"  scale  (HQ)  was  intended  to  over¬ 
come  some  of  the  difficulties  of  the  Cooper  scale  by  providing  a  con¬ 
tinuous  sequence  of  compatible  descriptors  across  the  entire  scale.  The 
trait  ratings  were  solicited  with  the  hope  that  they  would  provide  spe¬ 
cific  information  to  the  experimenter  on  the  nature  of  the  deficiencies. 
The  connections  between  these  ratings  will  be  examined  subsequently. 

Because  of  the  large  amount  of  interest  shown  in  the  "Cooper 
boundaries,"  i.e.,  the  divi  ons  between  satisfactory  and  unsatisfactory 
(3*5)  and  between  acceptable  and  unacceptable  (6.5),  the  experiment  was 
designed  to  test  the  existence  and  stability  of  them  by  the  following 
procedure : 

•  The  Cooper  rating  was  solicited  for  the  configuration. 

•  Another  card  was  presented  with  the  questions: 

1 .  Is  the  vehicle  controllable  during 
the  task? 

2.  Is  the  vehicle  acceptable  for  the 
task?  (May  have  deficiencies  which 
warrant  improvement,  but  is  adequate 
for  the  task. ) 

3.  Is  the  vehicle  satisfactory  for  the 
task?  (i.e.,  adequate  for  the  task 
without  improvement . ) 

Upon  scrutiny  of  the  data  it  was  apparent  that  the  experiment  would  not 
yield  the  correct  results  because  the  short-term  retention  of  the  pilot 
enabled  him  to  "ate  consistently  between  both  ratings.  Thus,  in  the 
entire  experiment  with  both  subjects,  no  variation  was  found  in  the 
"boundary"  versus  Cooper  ratings.  The  boundary  ratings  will  therefore 
not  be  considered  further. 

It  was  concluded  that  in  order  for  such  an  experiment  to  yield  valid 
results,  pilots  would  have  to  be  used  who  had  no  previous  knowledge  of 
the  Cooper  scale,  and  each  configuration  would  have  to  be  presented  twice, 
once  for  each  rating.  The  Cooper  scale  would  need  to  be  modified  so  as 
not  to  include  the  boundary  adjectives,  but  only  the  descriptors  and 
numerical  value s .  A  comparison  could  then  be  made  between  the  boundary 
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ratings  and  the  descriptors.  Unfortunately,  the  experiment  would  be 
quite  lengthy. 

1.  Campari  ion  of  Cooper ,  Handling  Qualities,  end  Cooper-Harper  Rating* 

A  comparison  of  the  Cooper  ratings  with  the  Cooper- Harper  ratings  for 
all  configurations  showed  that  with  one  pilot  (JDM),  out  of  57  ratings, 

3  were  1  unit  different,  1 6  were  0.5  unit  different,  and  the  rest  were 
identical.  With  the  other  pilot  (MDK),  out  of  84  ratings,  2  were  3  units 
different,  10  were  2  units  different,  70  were  1  unit  different,  leaving 
only  .wo  with  no  difference  at  all.  In  virtually  all  the  configurations 
where  differences  between  the  two  ratings  did  occur,  the  Cooper-Harper 
rating  was  the  larger  (worst)  of  the  two,  indicating  a  possible  slight  bias 
toward  the  bad  side.  It  Is  obvious  that  the  bias  is  a  function  of  the 
pilot,  since  pilot  MDK  had  an  essentially  fixed  difference  of  1  unit.  The 
cause  of  the  bias  is  unknown,  especially  in  light  of  the  fact  that  the 
satisfactory- unsatisfactory/ accept  able-unacceptable  boundaries  are  iden¬ 
tical  in  both  scales.  In  the  subsequent  discussion,  no  distinction  will 
be  made  between  the  Cooper  and  Cooper-Harper  ratings,  thus  reducing  the 
number  of  plots  required. 

In  Section  III,  the  semantic  relationship  between  the  various  and  sundry 
phrases,  including  Cooper’s,  was  determined  and  is  shown  in  Figs.  4l  and  42 
as  the  "Line  of  Semantic  Agreement,"  i.e.,  the  calibration  between  Cooper 
ratings  and  the  \jr  scale  that  was  found  from  the  semantic  experiment  des¬ 
cribed  in  Section  III  and  given  by  Eq.  8.  The  actual  ratings  obtained  in 
the  simulation  are  plotted  and  can  be  compared  to  the  calibration  line. 

The  numbers  on  the  data  points  indicate  how  many  identical  ratings  were 
obtained.  Bear  in  mind  that  the  calibration  line  is  a  theoretical  rela¬ 
tionship  based  on  data  obtained  from  a  semantic  experiment,  whereas  the 
data  points  are  actual  rating  data.  As  such,  the  "true"  ratings  are 
unknown  and  are  best  estimated  from  the  data.  The  differences  between 
the  data  and  the  calibration  line  are  definitely  one-sided.  A  possible 
explanation  for  this  is  determined  by  returning  to  the  original  question¬ 
naires  (see  Appendix  A).  There  it  can  be  seen  that  both  pilots  used  in  the 
experiments  were  more  pessimistic  than  average,  which  could  explain  the 
bias  noted  in  the  plots. 
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Line  of  Sematic 
Agreement  (Eq.  8) 


If  a  pilot  introduces  a  systematic  variability  in  all  ratings,  the 
effect  on  the  data  of  Figs.  4l  and  42  would  be  to  slide  the  data  points 
down  the  calibration  line  (or  para^el  to  it  if  a  bias  is  present).  If 
the  pilot  has  a  purely  randan  variance  (as  he  seems  to  have  in  the 
semantic  experiment  as  determined  by  comparing  the  scores  to  overall  means) 
the  observed  variance  could  be  as  large  as  the  variance  noted  in  the 
semantic  experiment,  i.e.,  the  square  of  the  discriminal  dispersion. 

The  discriminal  dispersion  has  been  shown  by  the  dashed  lines  in 
Figs.  4l  and  42.  Virtually  all  of  the  data  are  contained  by  these  lines, 
which  indicates:  1)  the  bias  present  in  each  pilot's  ratings  is  within 
1  a  of  the  average  pilot,  and  2)  it  appears  that  most  of  the  variability 
Is  due  to  semantics  and  not  to  the  evaluation  process.  Remember  that  we 
are  not  considering  the  variability  of  ratings  for  repeated  configurations 
or  bias  differences  between  pilots,  but  only  the  relative  semantic 
ambiguity  between  the  Cooper  descriptors  and  the  Handling  Qualities 
descriptors. 

It  is  concluded  on  the  basis  of  these  data  that  our  earlier  findings 
that  the  Cooper  scale  becomes  more  sensitive  at  the  bad  end  are  correct, 
and  that  in  an  actual  rating  situation  the  resolution  capability  of  the 
pilot  is  being  taxed  beyond  its  power  when  significance  is  placed  on 
differences  of  1  Cooper  unit  with  only  a  few  observations  at  the  bad  end 
of  the  scale. 

The  fact  that  there  is  semantic  consistency  in  the  ratings  of  two 
pilots  does  not  mean  that  they  will  closely  agree  upon  the  merits  of  a 
particular  vehicle.  It  is  an  indicator  of  the  level  of  confidence  that 
can  be  placed  on  resultant  ratings,  considering  also  the  pilot's  "set" 

(how  his  preference  is  affected  by  training,  experience,  etc.)  and  sensi¬ 
tivity  to  vehicle  parameter  changes  (how  his  deterioration  in  ratings  is 
affected  by  motivation,  ability,  and  self-assessment  of  performance). 

The  priority  of  attributes  to  be  possessed  by  a  pilot  is  fairly  clear. 
It  is  of  absolute  importance  that  the  pilot  have  a  good  ability  to  use 
words.  Unfortunately,  the  administering  of  a  test  which  would  give  data 
similar  to  that  of  Figs.  4l  and  42  is  not  at  all  an  easy  matter.  One 
alternative  is  to  use  the  conventions  of  the  past,  i.e.,  choose  pilots 
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who  have  a  strong  educational  and  technical  background.  The  participants 
of  the  semantic  survey  were  all  carefully  chosen.  Out  of  67  raters,  4  had 
to  be  disregarded  because  of  glaring  inconsistencies.  Since  it  was 
impossible  to  check  the  causes,  it  was  assumed  that  lack  of  motivation  or 
a  misunderstanding  of  the  instructions  were  most  likely  the  causes,  not 
an  inability  with  words. 

Another  alternative  would  be  to  construct  a  very  limited  version  of 
the  semantic  survey  (maybe  ten  key  phrases)  to  administer  to  possible 
rating  candidates.  Criteria  could  be  established  for  acceptance  or 
rejection  of  the  rater  based  strictly  on  verbal  ability.  The  candidate 
would  also  be  required  to  have  the  education,  background,  and  experience 
appropriate  to  the  rating  task. 

Considering  the  results  of  the  survey,  it  is  doubtful  that  such  a 
screening  is  necessary  if  raters  do  have  the  appropriate  background  and 
are  thoroughly  motivated. 

2.  Caspar  Ison  of  Global  Ratings  with  Trait  Ratings 

In  addition  to  the  global  ratings  (as  overall  ratings  are  often  called, 
i.e.,  Cooper,  Handling  Qualities,  Cooper-Harper),  ratings  of  Response 
Characteristics,  Control,  Demands  on  Pilot,  and  the  Effects  of  Deficien¬ 
cies  were  obtained.  The  phrases  used  were  those  previously  scaled  in 
Section  III.C  and  shown  in  Table  XVI.  The  intent  of  such  trait  ratings 
was  that  they  would  very  likely  be  closely  related  to  physical  character¬ 
istics  of  the  vehicle  or  system  and  thus  aid  the  engineer  in  determining 
the  appropriate  improvement,  or  at  least  in  identifying  the  problem. 

Table  XVIII  shows  some  anticipated  interactions  between  the  traits  and 
several  important  pilot,  vehicle,  and  system  parameters.  As  an  example, 
if  the  controlled  element  form  and  input  are  held  fixed  in  a  closed-loop 
tracking  task,  but  the  vehicle  gain  is  changed,  we  know  that  pilot  rating 
will  change  (Fig.  20),  but  that  performance  in  terms  of  what  the  pilot 
sees  will  remain  constant.  Thus,  as  a  function  of  gain,  it  was  anticipated 
that  the  rating  of  "Response  Characteristics"  would  remain  approximately 
constant,  while  the  ratings  of  "Ease  of  Control"  and  "Demands  on  the  I  Hot" 
would  vary  widely. 
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TABLE  XVIII 


ANTICIPATED  PILOT/ VEHICLE  SYSTEM  CORRELATES  FOR  TRAITS 


TRAIT 

SOME  ANTICIPATED  PILOT,  VEHICLE, 

AND  SYSTEM  PARAMETER  CORRELATES  FOR: 

OPEN-LOOP  MANEUVERS 

CLOSED-LOOP  TRACKING 

1 .  Response 

Characteristics 
!  (PC) 

Vehicle  numerator  and 
denominator  time  con¬ 
stants,  T^,  Tj,  command 
input 

aif 

system  bandwidth, 
remnant  level  (cp^) 

2.  Ease  and  Precision 
of  Control  (C) 

Vehicle  damping  and 
natural  frequency, 
stick  characteristics, 

Kc  (or  Kp) 

^mi  ^Pnnj  e(^)> 
stick  characteristics, 
Kc  (or  Kp),  Tl,  Tj 

3.  Demands  on  Pilot 
(DP) 

Complexity  of  open-loop 
response  to  command 
input,  stick  charac¬ 
teristics 

Te>  ^l*  Tj,  Kp,  Kc 

4.  Effects  of 

Deficiencies  on 
Performance  (ED) 

Overshoot,  rise  time, 
settling  time 

^/ec 

Figures  43,  44,  and  ^5  show  a  sunmary  of  the  results  of  the  trait 
ratings  for  both  pilots.  Observations  of  a  general  nature  are  that: 

1 .  There  is  a  somewhat  uniform  trend  between  the  Handling 
Qualities  (HQ)  rati'*_,  and  the  corresponding  trait 
ratings,  i.e.,  all  traits  seem  to  suffer  when  the 
global  rating  deteriorates. 

2.  When  there  is  disagreement  between  pilots  on  the  overall 
adequacy  of  the  configuration  for  the  task,  the  con¬ 
tributing  factors  are  reflected  by  all  of  the  traits. 

A  pilot  "set,"  then,  seems  to  be  exhibited  by  all  of 
the  traits.  This  could  mean  that  (a)  the  traits  measure 
independent  features  of  the  vehicle  which  all  vary  a 
similar  amount,  or  that  (b)  the  traits  are  all  des¬ 
cribing  the  same  phenomenon. 


In  seme  specific  instances,  lack  of  consistency  can  be  observed. 

Figure  44a  shows  that  cue  pilot  rated  a  low-gain  configuration  much  less 
demanding  than  the  high-gain  case,  even  though  it  took  as  much  >/s  .00  times 
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Handling  Response  Demands  Effects  of 

Qualities  Characteristics  Contrr'  on  Pilot  Deficiencies 


Yc  =  K/s 
JDM 

- Kb/s 

- IOKb/s 

- .IKb/s 


Handling  Response  Demands  Effects  of 

Qualities  Characteristics  Control  on  Pilot  Deficiencies 


Yc  =  K/s2 
JDM 

- Kb/s2 

- IOKb/s2 

- ,IKb/s2 


Handling  Response  Demands  Effects  of 

Qualities  Characteristics  Control  on  Pilot  Deficiencies 


Yc  *  K/[s2+  2(.7)(l6)s+(l6)*] 
JDM 

—  KB/[$2  ♦  2(.7MI6)s  * (I6)2] 

•  —  IOKB/[s2*  2(.7)(I6)s  t(!6)2] 

—  ,IKb/[s2»  2(.7)(.I6)s * (I6)2] 


Figure  43.  Variation  of  Trait  Ratings  with  Gain  for  JDM 
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Handling  Response  Demands  Effects  of 

Qualities  Characteristics  Control  on  Pilot  Deficiencies 


Handling  Response  Demands  Effects  of 

Qualities  Characteristics  Control  on  Pilot  Deficiencies 


Yc  =  K/s 
MDK 

- KB/s2 

- IOKb/s* 

- .IKb/s2 


(b) 


Handling  Response  Demands  Effects  of 

Gualities  Characteristics  Control  on  Pilot  Deficiencies 


Yc  =  K/[s2+2(7)(I6)s+(I6)2] 
MDK 

- Kb  /[s2 + 2 (,7KI6)s + (I6)2] 

- IOKb/[s2+2(.7)(I6)s+(I6)2J 

- .IKb  /  [s2+  2(7)(  I6)s  *(I6)2J 


(c) 


Figure  44.  Variation  of  Trait  Ratings  with  Gain  for  MDK 


JDM  MDK 

—  Kb/s  . Kb/s 

- Kb/s2  Kb/s2 

- Kb/[«2  ♦  2(.7MI6)s  *(I6)2]  Kb/[s2  ♦  2(.7)(l6)s  ♦  (I6)2] 


Handling  Response  Demands  Effects  of 

Qualities  Characteristics  Control  on  Pilot  Deficiencies 


Figure  45.  Controlled  Element  Form  Effects  on  Trait  Ratings 

the  stick  travel  and  force  to  obtain  equivalent  performance.  Figure  45 
shows  that  for  the  ccmplex-pair  controlled  element,  the  demands  on  the 
pilot  were  rated  in  opposite  directions  by  the  two  pilots. 

Taking  into  consideration  the  observed  trends  and  inconsistencies, 
the  usefulness  of  the  trait  ratings  as  supplementary  indicators  appears 
to  be  very  limited.  The  connections  between  the  traits  and  specific 
parameters  were  originally  intended  to  be  investigated  via  computerized 
correlation  and  factor  analysis  techniques.  However,  on  the  basis  of  the 
results  of  Figs.  43,  44,  and  45,  it  is  concluded  that  a  considerably 
larger  population  of  pilots  would  need  to  be  sampled  before  any  useful 
results  could  be  obtained.  The  scaled  trait  descriptors  could  be  used, 
however,  to  construct  a  specialized  global  scale,  should  an  experimenter 
need  one. 

A  possible  alternative  to  the  scaled  trait  ratings  would  be  Osgood's 
(30)  semantic  differential  type  of  rating  scale,  where  the  extremes  of 
several  subjective  qualities  are  presented  to  the  pilot  and  he  is  forced 
to  select  sane  degree  of  goodness  of  each  by  placing  a  m'jrk  on  the  line 
joining  the  two  extremes.  The  disadvantage  of  such  a  technique  is  that 
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no  meaningful  numerical  values  can  be  assigned  the  resultant  ratings. 
Perhaps  a  fruitful  area  of  research  would  be  the  use  of  psychometric 
methods  to  scale  the  data  obtained  in  semantic  differential  or  forced 
choice  form  in  a  display  evaluation  (3*0,  for  example. 

At  this  time  it  must  be  reluctantly  concluded  that  the  scaled  trait 
ratings  are  of  no  apparent  value  in  pointing  out  areas  of  deficiencies  to 
the  engineer. 

S.  OBtEBAL  APPROACH  TO  RATING  ESTIMATES 

Because  of  the  lack  of  data  pertaining  to  pilot  "set,”  or  individual 
differences  between  pilots,  it  is  premature  to  attempt  to  construct  a 
pilot  rating  model.  However,  it  ic  felt  that  the  data  are  sufficient  to 
enable  estimates  of  increments  of  ratings  due  to  vehicle  and  environmental 
changes.  The  general  approach  is  outlined  below. 

Because  of  the  complex  nature  of  pilot  adaptation,  caution  is  abso¬ 
lutely  necessary  when  attempting  to  anticipate  a  rating  for  a  given  con¬ 
figuration.  The  two  primary  questions  that  must  be  answered  are:  l)  what 
performance  can  the  pilot  attain  relative  to  that  specified,  and  2)  how 
near  to  his  adaptation  limits  is  the  pilot  while  maintaining  the  perfor¬ 
mance.  The  first  question  is  answered  by  conducting  an  analysis  of  the 
pilot/vehicle  system.  In  the  case  of  compensatory  tracking,  the  adjust¬ 
ment  rules  of  McRuer  (  5  )  generally  provide  a  good  estimate  of  overall 
performance  that  can  be  expected.  If  performance  is  worse  than  that 
specified  in  the  definition  of  the  task,  decrements  in  rating  similar  to 
that  shown  in  Fig.  27  would  be  expected. 

The  second  question  car.  be  answered  by  estimating  the  individual  pilot 
parameters.  If  the  crossover  model  of  the  operator  is  valid,  pilot  ratings 
would  be  expected  to  be  proportional  to  the  effective  time  delay,  Fig.  21, 
which  in  turn  reflects  both  equalization  and  input  effects.  If  the  cross¬ 
over  model  is  not  suitable,  as  in  the  subcritical  task,  a  more  detailed 
analysis  would  be  in  order  to  determine  if  the  operator  is  near  his 
limits.  The  effects  of  a  regression  (i.e.,  increase)  of  xe  with  a  large 
were  not  investigated  in  the  present  experiments. 
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The  pilot  also  has  definite  preferences  for  control  stick  characteris¬ 
tics.  If  his  preferred  gain  is  known,  the  decrement  due  to  nonoptimum 
gains  can  be  predicted  from  Fig.  20. 

The  question  is  always  asked,  "What  does  it  mean  when  the  sum  total 
of  all  of  these  effects  indicates  a  rating  far  worse  than  the  worst  on 
the  scale  —  say,  a  Cooper  rating  of  20?"  The  answer  is  simply  that  the 
scale  is  not  absolute,  but  only  relative.  Ratings  must  therefore  be 
truncated  at  9,  which  is  somewhat  analogous  to  admitting  that  most  heme 
thermometers  would  not  yield  a  correct  measure  of  0°  Kelvin  I 

Hopefully,  rating  variations  have  been  shown  with  enough  pilot  and 
system  parameters  to  enable  the  engineer  to  estimate  relationships  with 
confidence  and  with  a  minimum  of  analysis.  A  significant  amount  of  work 
remains  to  be  accomplished,  however.  Hie  next  section  will  detail  recom¬ 
mendations  to  further  improve  the  state-of-the-art,  and  will  summarize  the 
many  conclusions  reached  throughout  the  study. 
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The  study  program  described  herein  has  led  to  a  large  number  of  very 
interesting  findings,  -which  can  be  drawn  together  in  this  section  to 
form  a  fairly  complete  picture  of  the  current  state  of  rating  technology. 

The  findings  lend  themselves  to  a  natural  division  into  two  categories. 

The  first  part  of  this  study  was  aimed  at  the  problems  of  rating  scales 
themselves,  arid  led  to  a  somewhat  separate  and  independent  set  of  con¬ 
clusions.  It  will  be  discussed  first.  Then  the  effects  of  the  physical 
system  on  ratings  can  be  discussed. 

A.  SUZMIRY  07  RATING  SCALE  FINDINGS 

Rating  scales  are  subjective  in  nature  and  therefore  are  scales  of 
comparison.  As  such,  they  should  have  no  absolute  values  associated 
with  them.  The  use  of  rating  scales  will  result  in  such  phenomena  as 
pilot  biases  due  to  personal  preferences  based  on  training,  experience,  and 
general  background;  differences  due  to  interpretation  of  the  objectives  of 
the  rating  situation;  and  biases  and  variability  due  to  deficiencies  in  the 
scale  itself.  The  first  source  of  bias  can  be  minimized  by  careful  planning 
and  definition  of  the  criteria  used  in  the  experiment;  the  second  and  third 
are  amenable  to  analysis  and  improvement, 

A  considerable  amount  of  effort  was  devoted  to  the  interpretation 
problem  in  Section  II,  where  "ground  rules"  regarding  definitions  of 
missions,  tasks,  etc.,  were  established.  Thus,  the  bias  due  such 
factors  can  be  minimized,  and  the  interchangeability  and  consistency 
of  experimental  data  should  be  much  improved. 

The  problems  with  the  scale  itself  were  noted  in  Section  II,  and 
attacked  in  earnest  in  Section  III.  An  application  of  psychometric 
methods  yielded  a  set  of  .scaled  descriptors  showing  that 

1 .  There  is  an  underlying  psychological  dimension, 
or  continuum  (called  the  \|r  scale  herein),  which 
has  a  constant  subjective  sensitivity  along  its 
length.  A  measure  of  the  sensitivity  is  called 


the  "discriminal  dispersion,”  and  is  essentially 
the  standard  deviation  of  the  resolving  power  of 
raters  to  distinguish  semantic  differences  in 
language.  The  constant  sensitivity  yields  an 
interval  scale,  where  the  intervals  are  units 
related  to  noticeable  semantic  differences.  The 
interval  nature  of  the  dimension  allows  ratings 
to  be  averaged,  which  has  heretofore  been 
mathematically  inappropriate. 

2.  The  Cooper  scale  (l)  and  Cooper-Harper  scale  (3) 
are  very  nearly  functionally  related  to  the  \|/ 
dimension.  The  error  introduced  by  averaging 
Cooper  ratings,  rather  than  their  equivalent, 
is  small  provided  enough  trials  have  been  made 
to  ensure  confidence  In  the  ratings  (see  next 
paragraph) . 

3.  The  Cooper  and  Copper-Harper  scales  are  shown  to 
be  overly  sensitive  at  the  bad  ends,  so  that 
attaching  significance  to  a  difference  of  one 
Copper  unit  between  ratings  at  the  bad  end  would 
require  a  relatively  large  number  of  trials. 

4.  The  results  of  the  current  experiments  show  an 
internal  consistency  between  the  Cooper  phrases, 
t  values,  and  Cooper  ratings  to  such  an  extent 
that  it  is  concluded  that  a  scale  based  on 

the  ^-scale  values  would  solve  many  of  the 
problems  which  currently  exist.  Such  a  scale 
might  appear  as  shown  in  Fig.  46.  There,  "degrees 
of  goodness”  of  handling  qualities  are  distributed 
along  a  7-point  scale,  which  has  a  uniform  sensi¬ 
tivity  along  its  length.  The  scale  shown  would  be 
called  a  "global”  scale,  since  it  integrates  all 
deficiencies  into  the  one  descriptor  "handling 
qualities.”  Specialized  scales  could  be  similarly 
constructed  by  using  the  catalog  of  scaled  phrase¬ 
ology  included  in  this  report. 

The  choice  of  a  7-point  scale  is  somewhat  arbitrary,  although  it  is 
felt  that  it  would  be  optimum  in  that  it  would  be  sensitive  enough  to 
detect  significant  differences  in  opinion  but  at  the  same  time  would  not 
tempt  the  pilot  into  reporting  differences  which  could  not  be  statistically 
confirmed. 

In  any  event,  the  scale  values  given  in  this  report  can  be  linearly 
transformed  to  any  interval  base  from  the  9-point  scale  on  which  they 
were  based. 
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Favorability  of  Handling  Qualities 
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Figure  46.  A  Global  Rating  Scale  for  Handling  Qualities  Evaluation 


Two  rather  negative  and  disappointing  conclusions  regarding  the 
investigated  scales  are: 

1 .  The  verification  of  the  existence  of  the  Cooper 
boundaries  (i.e.,  the  satisfactory-unsatisfactory 
boundary  at  3.5,  and  the  acceptable-unacceptable 
boundary  at  6.5),  and  the  stability  of  them  relative 
to  the  scale  descriptors  could  not  be  determined. 

This  is  considered  the  final  link  necessary  to  prove 
the  validity  of  the  excellent  decision  tree  type  of 
process  introduced  in  the  Cooper-Harper  scale  ( 3)  • 

An  experiment  which  would  demonstrate  boundary 
existence  is  suggested  in  Section  V.D. 

2.  The  trait  ratings,  which  had  initially  been  proposed 
to  construct  auxiliary  scales  for  the  purpose  of 
rooting  out  specific  physical  vehicle  deficiencies 
for  the  engineer  were  disappointing.  The  variability 
and  lack  of  consistency  between  the  two  pilots  indicates 
that  the  traits  chosen  for  investigation  are  not  selective. 
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A  large  population  of  pilots,  together  with  the 
computer  aids  of  regression  and  factor  analyses 
potentially  could  provide  the  desired  relation¬ 
ships,  but  the  likely  attendant  confidence  levels 
would  make  the  usefulness  of  such  ratings  doubtful. 

The  investigation  of  the  possibility  of  obtaining  numerical  data 
when  using  the  semantic  differential  technique  has  been  suggested  in 
Section  V.D  as  a  possible  alternative  to  the  trait  ratings.  Seme 
additional  research  into  scaling  techniques  would  be  required. 

B.  SUMMARY  OT  RAT3H0  CORRHATXOSI8  WITH  PILOT, 

VEHICLE,  AMD  8TBTEM  PARAMETERS 

The  considerable  data  available  indicate  that,  where  closed- loop 
compensatory  tracking  is  the  task,  the  pilot's  increments  in  rating 
are  based  on  the  relative  difficulty  with  which  he  obtains  and  maintains 
the  specified  performance.  An  estimate  of  performance  is  obtained 
directly.  An  indication  of  the  difficulty  involved,  however,  is  not 
so  obvious.  Perhaps  the  most  direct  measures,  judging  from  the  data, 
are  the  gain  required  of  the  pilot,  which  directly  determines  muscular 
activity  arl  sensitivity,  and  the  equalization  required  of  the  pilot 
for  stability. 

The  interactions  between  these  parameters  and  the  other  system 
parameters  axe  quite  complex;  nevertheless,  a  growing  body  of  literature 
is  available  to  aid  the  engineer  in  estimating  than.  Rating  correlations 
with  other  parameters  are  also  shown  to  be  of  potential  use  to  the 
engineer  in  rating  estimation,  but  are  less  direct. 

The  notion  that  task  performance  and  difficulty  are  the  causal 
factors  of  pilot  ratings  was  further  supported  by  an  experiment 
measuring  an  "attention  level"  related  parameter.  A  secondary  task 
was  used  to  "load"  the  pilot  so  that  primary  performance  began  to 
deteriorate.  The  correlations  given  in  Section  V.C  show  that  good 
agreement  exists  between  the  level  of  difficulty  attainable  with  the 
secondary  task  and  the  rating  for  primary  task  alone.  This  application 
of  a  secondary  task  to  find  the  "attention  level"  or  "excess  capacity" 
of  the  pilot  has  an  excellent  potential  of  becoming  an  objective 
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measure  of  pilot  rating  which  can  be  related  directly  to  pilot  and 
system  parameters. 

The  technique  was  not  optimized,  nor  has  any  supporting  theory  been 
evolved.  The  results  indicate  that  the  application  does  have  the  poten¬ 
tial  of  supplying  the  handling  qualities  community  with  a  "pilot  rating 
thermometer."  It  is  therefore  recommended  that  seme  additional  work  be 
caxried  out  along  the  lines  of  optimization  of  the  technique,  and  that 
some  effort  be  directed  at  a  theory  connecting  secondary  loading  score 
with  primary  effective  time  delay,  channel  capacity,  maximum  data  rates, 
etc . 

A  negative  conclusion  can  be  drawn  from  the  neuromuscular  tension 
data.  It  was  initially  hypothesized  that  the  task  difficulty  would  also 
be  reflected  by  the  overall  muscular  tension  level,  which  could  even  be 
a  primary  "cause"  of  decrement  in  rating.  The  data  did  not  bear  this 
out,  however.  The  average  tension  level  did  increase  with  increased 
stick  displacement,  which  is  a  rather  trivial  result,  but  also  a  result 
which  confirms  the  accuracy  of  the  measurement  method. 

The  limited  number  of  participating  pilots  (two)  precluded  the 
discovery  of  any  "set"  or  "motivational"  rules.  The  correlation  results 
are  thus  really  only  applicable  to  incremental  changes  in  rating.  It  is 
suggested  that  the  problem  will  be  extremely  difficult  to  quantify.  There¬ 
fore,  another  appeal  will  be  made  here  to  the  engineer:  thoroughly  specify 
the  cask,  including  required  performance.  Publish  the  task  specification 
along  with  the  data.  Only  then  can  useful  data  be  interchanged  between 
experimenters  and  designers. 

Finally,  because  of  the  vast  amount  of  data  accumulated  during  this 
study,  the  choice  between  correlations  of  parameters  versus  Cooper  ratings 
or  versus  f  ratings  had  to  be  made  in  many  places  for  the  sake  of  space 
and  economy.  Since  so  many  previous  Cooper  rating  correlations  exist, 
and  because  such  a  wide  audience  has  been  exposed  to  them,  the  Cooper 
rating  was  usually  selected.  However,  it  has  been  shown  here  that  the 
tad  end  of  the  Cooper  scale  can  be  misleading  because  of  a  pilot's  lack 
of  sensitivity  at  that  end.  It  is  therefore  suggested  that  a  scale  similar 
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to  that  shown  in  Fig.  k6  be  developed.  Any  averaging  will  then  be 
legitimate,  variabilities  will  be  constant  across  the  scale,  and  the 
number  of  necessary  trials  will  be  fixed  across  the  scale. 
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APPWDEC  A 


RATING  SCALE  SURVEY  ASD  RESULTING  RAW  DATA 

A  questionnaire  designed  to  determine  the  semantic  values  of  6h 
handling  qualities  descriptive  phrases  was  received  from  67  pro¬ 
fessionals  in  the  piloting,  engineering,  and  human  factors  fields. 

Of  those  received,  four  were  discarded  because  of  grossly  incorrect 
interpretation  of  the  experiment.  The  instructions,  experience  form, 
phrases,  and  the  first  page  of  the  response  sheets  are  given  here. 

The  responses  were  read  off  the  axes  to  the  nearest  tenth  of  a 
division  and  tabulated.  The  tabulations  follow  in  Table  A- 1  and 
present  the  raw  data  used  in  the  successive  interval  digital 
program. 


A  QUESTIONNAIRE  TO  EXPERIMENTALLY  DETERMINE  THE  PSYCHOLOGICAL  INTERVAL 
BETWEEN  SOME  PHRASES  COMMONLY  FOUND  IN  THE  HANDLING  QUALITIES  LITERATURE 

INSTRUCTIONS 


The  purpose  of  this  questionnaire  is  to  evaluate  the  meanings  of  some 
words  and  phrases  commonly  used  in  handling  qualities  literature,  flight 
test  reports,  and  pilot  rating  scales.  The  need  to  make  such  an  evalua¬ 
tion  has  become  apparent  from  inconsistencies  and  ambiguities  in  present 
scales.  Hopefully,  the  results  of  this  questionnaire  will  allow  some 
modifications  to  existing  scales  which  will  vastly  improve  their  utility. 

At  the  beginning  of  the  questionnaire  is  a  list  of  the  words  and 
phrases  which  we  hope  to  evaluate .  The  list  is  presented  at  the  beginning 
so  that  you  can  familiarize  yourself  with  the  types  o:’  phrases,  the  spread 
that  each  type  covers,  and  the  way  in  which  they  interrelate  with  each 
other.  You  will  notice  that  the  phrases  refer  to  characteristics  such  as 
controllability,  sensitivity,  etc.,  in  varying  degrees.  Perhaps  a  good 
way  to  become  familiar  with  them  ould  be  to  look  for  oiie  extremes  of  each 
characteristic  (i.e.,  the  "best"  and  the  "worst")  in  the  list.  In  any 
modified  scale  we  will  probably  combine  some  of  these  phrases  if  they  seem 
to  have  similar  psychological  weights,  or  degrees  of  goodness,  to  you. 

Most  of  the  words  and  phrases  used  are  expected  to  be  completely 
familiar.  However,  the  use  of  the  term  "primary  and/or  secondary 
responses"  needs  some  explanation.  This  phrase  is  intended  to  make  you 
think  of  two  kinds  of  responses  —  the  first,  the  direct  (and  desired) 
result  of  control  actions,  e.g.,  roll  to  a  specified  bank  angle;  the 
second,  the  indirect  motions  which  also  occur,  e.g.,  sideslipping  and 
yawing.  In  the  vertical  plane  a  pertinent  example  is  the  "secondary" 
altitude  or  speed  perturbations  following  a  "primary"  change  in  pitch 
attitude.  Notice  that  "secondary"  responses  are  desirable  (e.g.,  air¬ 
speed  change  or  turn  rate)  when  they  are  of  the  proper  form. 

In  the  questionnaire  itself  the  phrases  are  presented  individually  in 
a  random  manner  alongside  a  vertical  bar  graph.  Imagine  that  you  are  read¬ 
ing  the  phrase  in  a  handling  qualities  or  flight  test  report,  and  that  the 
test  pilot  is  describing  a  vehicle  which  he  has  tested.  When  you  have 
formed  an  impression  of  the  vehicle,  document  your  impression  '-y  .  lacing 
an  "X"  on  the  vertical  line  in  the  appropriate  spot.  If  you  fec^  uhat  the 
phrase  describes  a  vehicle  with  the  best  imaginable  handling  qualities, 
your  "X"  would  belong  at  the  very  top  of  the  line.  Conversely,  the  worst 
imaginable  handling  qualities  should  be  rated  at  the  very  bottom  edge  of 
the  scale.  The  marks  along  the  scale  are  intended  only  to  help  you  pre¬ 
cisely  place  your  "X"  on  the  vertical  line.  The  scale  should  be 
considered  continuous.  To  carry  out  the  experiment: 

1 .  Please  fill  out  the  experience  form  (page  2) . 

2.  Study  the  list  of  phrases  (page  3-5)  long  enough  to  become 
familiar  with  them  (rereading  the  second  paragraph  above 
may  help  you) . 

3.  Then  reread  this  entire  page  so  that  the  purposes  and 
instructions  are  clear. 

Then  turn  to  the  questionnaire  (page  6)  and  start  working 
through  the  phrases.  Please  work  through  them  in  order, 
and  do  not  turn  back  to  the  pages  listing  the  phrases. 
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ftAsxra  eoAia  axpsmoa 


Name: _ 

Occupation : _ _ 

Location : 

Experience  relevant  to  rating  scales  obtained  as: 

□  Pilot 

I  I  Test  Pilot 

I  I  Handling  Qualities  Engineer 

□  Psychologist 

□  Human  Factors 

|  |  Other: _ 


If  pilot,  total  hours  (approximately) . . hr 

Military  fighters . hr 

Heavy  aircraft  (bombers,  transports) . hr 

Light  aircraft . hr 

Helicopters . hr 

Instrument . hr 

Simulator . hr 


Rating  scales  with  which  you  are  familiar: 

□  Cooper's  (NASA) 

I  I  Cornell  Aeronautical  Laboratory 
fl  Other : 


Approximate  time  spent  evaluating  with  rating  scale . . hr 

Fixed-base  simulator . . hr 

Moving-base  simulator . ______  kr 

Aircraft . . . .  . hr 
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Date: 

Age: 


WORDS  AND  PHRASES  TO  BE  EVALUATED 


1.  Fair,  somewhat  Impure  primary  or  secondary  response  characteristics* 

2.  Excellent,  pure  (i.e.,  no  "accidental"  excitation)  primary  and 
secondary  response  characteristics. 

3*  Moderately  sensitive,  sluggish  or  uncomfortable  in  primary  or  second¬ 
ary  responses. 

4.  Barely  controllable. 

5*  Easy  to  control  with  fair  precision. 

6.  Major  improvements  are  needed. 

7*  Highly  desirable  handling  qualities. 

8.  Controllable  with  fair  but  somewhat  inadequate  precision. 

9*  Moderately  objectionable  deficiencies. 

10.  Very  objectionable  deficiencies. 

1 1 .  Extremely  easy  to  control  with  excellent  precision. 

12.  Difficult  to  control. 

13.  Requires  maximum  available  pilot  skill  and  attention  to  retain  control. 

14.  Some  minor  but  annoying  deficiencies. 

15*  Marginally  controllable. 

16.  Completely  demanding  of  pilot  attention,  skill  or  effort. 

17*  Excellent  handling  qualities. 

18.  Controllable,  but  only  very  imprecisely. 

19*  Extremely  sensitive,  sluggish  or  uncomfortable  in  primary  or  secondary 
response . 

20.  Effect  of  deficiencies  on  performance  is  easily  compensated  for  by  pilot. 
21  •  Largely  undemanding  of  pilot;  relaxed. 

22.  Nearly  uncontrollable. 

23*  Some  mildly  unpleasant  characteristics* 

24.  Fair  handling  qualities. 
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25-  Controllable  with  somewhat  inadequate  precision. 

26.  Improvement  Is  requested. 

27*  Quite  sensitive,  sluggish  or  uncomfortable  in  primary  or  secondary 
responses . 

28 .  Uncontrollable . 

29 •  Very  demanding  of  pilot  attention,  skill  or  effort. 

30.  Completely  undemanding  of  pilot;  very  relaxed  and  comfortable. 

31 .  Very  difficult  to  control. 

32.  Pilot  compensation  required  for  acceptable  performance  in  mission  is 
too  high. 

33*  Mildly  demanding  of  pilot  attention,  skill  or  effort. 

34.  Controllable  with  definitely  inadequate  precision. 

35*  Very  sensitive,  sluggish  or  uncomfortable  in  primary  or  secondary 
responses . 

36.  Very  bad  handling  qualities. 

37.  Good,  relatively  pure,  primary  and  secondary  response  characteristics. 

38.  Requires  substantial  pilot  skill  and  attention  to  retain  control  and 
continue  mission. 

39-  Somewhat  undesirably  demanding  of  pilot  attention,  skill  or  effort. 

40.  Improvement  is  needed. 

41 .  Good  handling  qualities . 

42.  Mildly  sensitive,  sluggish  or  uncomfortable. 

43.  Very  easy  to  control  with  good  precision. 

44.  Requires  best  available  pilot  compensation  to  achieve  minimum  acceptable 
performance . 

43.  Controllable  with  fair,  but  somewhat  inadequate  precision. 

46.  Much  too  sensitive,  sluggish  or  uncomfortable  in  primary  or  secondary 
responses . 

47.  Good  enough  for  mission. 

48.  Very  poor  handling  qualities. 
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49*  Pleasant  handling  qualities. 

50.  Controllable  with  difficulty. 

51  •  Definitely  sensitive,  sluggish  or  uncomfortable  in  primary  or  second¬ 
ary  responses. 

52.  Extremely  demanding  of  pilot  attention,  skill  or  effort. 

55»  Major  deficiencies. 

54.  Controllable  with  somewhat  inadequate  precision. 

55*  Objectionable  deficiencies. 

56.  Definitely  demanding  of  pilot  attention,  skill  or  effort. 

57*  Bad  handling  qualities. 

58.  Quite  demanding  of  pilot  attention,  skill  or  effort. 

59*  Reasonable  performance  requires  considerable  pilot  compensation. 

60.  Mandatory  improvement  required. 

61 .  Controllable  with  poor  precision. 

62.  Very  objectionable  deficiencies. 

65.  Demanding  of  pilot  attention,  skill  or  effort. 

64.  Poor  handling  qualities. 
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Most  Favorable 


1  .  Fair,  somewhat  impure  primary  or 
secondary  response  characteristics. 


Least  Favorable 


2.  Excellent,  pure  (i.e.,  no  "accidental" 
excitation)  primary  and  secondary 
response  characteristics . 


Most  Favorable 


Least  Favorable 


0*  Moderately  sensitive,  sluggish  or 
uncomfortable  in  primary  or 
secondary  responses. 


Most  Favorable 


Least  Favorable 


A-T 


TABLE  A-I 


RAW  SCORES  OF  QUESTIONNAIRE  SURVEY 
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REDUCED  SURVTT  DATA 


The  raw  data  given  in  Appendix  A  was  processed  in  several  ways  to 
obtain  thr  desired  relationships.  The  results  are  given  in  Table  B-I, 
where  the  column  numbers  correspond  to  the  following  calculations: 

(l),  (2)  The  grand  means  and  variances  were  computed  for 

all  of  the  items.  The  items  were  then  rank-ordered 
by  mean. 

(D>  ©  The  scores  for  each  rater  were  transformed  as 

described  in  Section  III-C-3b.  The  grand  means 
and  variances  were  then  computed  for  the  trans¬ 
formed  scores. 

©,  ©  The  scale  values  and  discriminal  dispersions  were 
computed  for  the  6?  items  (No.  28,  uncontrollable, 
was  not  included  for  reasons  noted  in  Section  III) 
with  a  digital  program  [Cumrey  ( 1 7 )  ]  • 

(j)  The  scale  values  were  recomputed  after  the  high 

variability  items  were  excluded,  leaving  31  items 
to  be  scaled.  The  retained  items  are  given  in 
Table  B-II. 
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IY  OF  RESULTS  OF  PROCESSING  THE  QUESTIONNAIRE  DATA  OF  APPENDIX  A 
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TABLE  B-II 

ITEMS  INCLUDED  IN  LOW- VARIABILITY  CALCULATION  (COL.  7)  OF  TABLE  B-I 


TRANSFORMED  SCORE  | 

ITEM  NO. 

ITEM 

MEAN 

VARIANCE 

Handling  Qualities 

17 

Excellent  handling  qualities 

0 

7 

Highly  desirable  handling 
qialities 

0.45 

1+1 

Good  handling  qualities 

MEM 

49 

Pleasant  handling  qualities 

B9 

21+ 

Fair  handling  qualities 

4.15 

1.59 

57 

Bad  handling  qualities 

7-74 

1 .81 

56 

Very  bad  handling  qualities 

8.22 

1 .61 

Control 

11 

Extremely  easy  to  control  with 
excellent  precision 

0.97 

0.44 

1+5 

Very  easy  to  control  with  good 
precision 

1 .76 

0.65 

5 

Easy  to  control  with  fair 
precision 

5.21 

1.15 

51+ 

Controllable  with  somewhat 
inadequate  precision 

5-45 

1 .28 

18 

Controllable,  but  only  very 
imprecisely 

6.65 

1.59 

12 

Difficult  to  control 

7.18 

1 .67 

51 

Very  difficult  to  control 

8.15 

1 .18 

.  22 

Nearly  uncontrollable 

8.91 

0.59 

Precision 

11 

Extremely  easy  to  control  with 
excellent  precision 

0.97 

0.44 

1+5 

Very  easy  to  control  with  good 
precision 

1  .76 

0.65 

5 

Easy  to  control  with  fair 
precision 

5.21 

1.15 

25 

Controllable  with  somewhat 
inadequate  precision 

5.J+5 

1 .40 

18 

Controllable,  but  only  very 
imprecisely 

6.65 

1.59 
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TABLE  B-II  (Continued) 


ITEM  NO. 

ITEM 

TRANSFORMED  SCORE  1 

MEAN 

VARIANCE 

Response  Characteristics 

2 

Excellent,  pure  (i.e.,  no 

accidental  excitation)  primary 
and  secondary  response  charac¬ 
teristics 

0.99 

0.49 

37 

Good,  relatively  pure,  primary 
and  secondary  response  charac¬ 
teristics 

2.47 

0.88 

1 

Fair,  somewhat  impure  primary 
or  secondary  response 
characteristics 

4.62 

2.43 

27 

Quite  sensitive,  sluggish  or 
uncontrollable  in  primary  or 
secondary  responses 

6.00 

2.49 

19 

Extremely  sensitive,  sluggish  or 
uncontrollable  in  primary  or 
secondary  responses 

7.10 

1 .94 

Effects  of  Deficiencies 

;  20 

Effects  of  deficiencies  on 
performance  is  easily  compen¬ 
sated  for  by  pilot 

4.04 

1.33 

i4 

Some  minor  but  annoying 
deficiencies 

4.50 

1.59 

9 

Moderately  objectionable 
deficiencies 

5.57 

1.48 

53 

Major,  very  objectionable 
deficiencies 

7.65 

1.64 

Demands  on  Pilot 

30 

Completely  undemanding  of 
pilots,  very  relaxed  and 
comfortable 

1 .65 

0.94 

21 

Largely  undemanding  of  pilots, 
relaxed 

2.36 

0.98  : 

33 

Mildly  demanding  of  pilot  atten¬ 
tion,  skill  or  effort 

4.22 

1.39 
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TABLE  B-II  (Concluded) 


ITEM  NO. 


Demanding  of  pilot  attention, 
skill  or  effort 

Very  demanding  of  pilot 
attention,  skill  or  effort 

Completely  demanding  of  pilot 
attention,  skill  or  effort 


TRANSFORMED  SCORE 


MEAN  I  VARIANCE 


5.88  1.70 


7.50  1.86 


8.56  1.41 
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APPEND  EC  C 


TABULATION  OF  EXPERIMENTAL  MEASURES  AND  DESCRIBING  FUNCTIONS 

This  appendix  contains  the  describing  function  plots  which  were 
computed  for  pilot  JDM,  and  a  tabulation  (Table  C-I)  of  the  experimental 
measures  made  during  the  trials.  The  curve  fits  for  the  describing  func¬ 
tions  are  shown  on  the  figures  themselves.  The  describing  function  figures 
are  identified  by  run  number  and  controlled  element,  and  are  in  chrono¬ 
logical  order.  (The  run  number  gives  the  year,  month,  day,  and  number  of 
run  on  that  day.  Thus,  671002-3  was  the  third  ran  on  October  2,  1967.) 

u)  *  1.88  rad/sec 
1 

a>2  “  2.89  rad/sec 

<*>3  “  4.78  rad/sec 

Oj  -  0.5  cm/sec 

a  »  1.0  cm/sec 
2 

o3  m  1.5  cm/sec 
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RUN  LOG  AND  SUMMARY  OF  PARAMETERS  MEASURED  DURING  THE  EXPERIMENTS 
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TABLE  C-I  (Continued) 


4  YPYC  (deg) 


C-5 


i  YpYc(deg) 


[ 


C-7 


4  YpYe(deg) 


C-8 


*  YPYC  (deg) 


4  YpYc  (deg) 


<  Vc  Meg) 


4  Yp  (deg) 


0.1 


1.0 


10.0 


C-12 


u)  (rod /tec) 


ir  YpYc  (deg) 


'""H* 


<  YpYc  (deg)  Y„Y, 


20 


c-ii+ 


w  (rod /tec) 


•If 


4  Yp  Yc  (deg) 


4  YpYc(deg) 


4  YpYc  (deg) 


20 


-80 


-120 


13T 


Gf 


-160 


0 


to 


0 


Q 


-200 


0 


-180°  I 


0 


0 


O 


-240 


Run  Number  671006-11 
Yc  «  10/s 

Input  «  B6"  - 1.08  - 1.5 
St  «  JDM 

pic  *-702 


O 


0.1 


1.0 


u)  (rad/sec) 


10.0 


C—  1 7 


*  YpYc(deg) 


C-i8 


<  YpYc(deg) 


0-21 


i  YpYe(deg) 


C-25 


Run  Number  671006-31 

Ye  •  5/s2 

Input  *  B6"-l.88-l 


St  *  JDM 
Poe  *.506 


1.0 


cu  (rad/sec) 


10.0 


i  YpYc  (deg) 


*  YpYc  (deg) 


C-29 


oi  (rad /sec) 


10.0 


40 


^  -200 


>-a 

'*• 


-240 
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Yc  *  l/s(s+l) 

Input  *  B6 -1.88-1 
S»  *  JDM 
^•604 
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0-326 

J _ 
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u>  (rad /tec) 


IOjO 


C-3i 


<  YpYc  (deg) 


i  YpYc(deg) 


C-54 


i  Yp  Yc  (deg) 


<  YPYC  (deg) 
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Yc« 

>l/s 

Input  ■  B6m  -  1.88  -.5 

s,« 

»  JDM 

fiic 

..66 

u  (rad /sec) 


4  YpYc(deg) 


<  YPYC  (deg) 


4  Yp  Ye  (deg)  YPY, 


4  Yp  Yc  (deg) 
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C-b3 


C-45 


4  YpYc(deg) 


20 


-h6 
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•  [• 


<  Yp  Yc  (deg) 


mwmirai 
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i  YpYc(deg) 


(deg) 


Run  Numbtr  671129-09 
Yc«.l/t 

Input  ■  B6m  -  1.88  -  I 
$t  •  JDM 
pIq  *.859 


*  VpVc  (deg) 


(6ap)  3xdA  1 


APPENDIX  D 


COOPEN  RATINGS  PROM  STUDY  OF  McHUER  (5) 

Hie  data  contained  in  Table  D-I  are  Cooper  ratings  which  were  obtained 
during  an  experimental  series  in  1965.  These  data,  together  with  the 
Cornell  ratings  shown  in  Figs,  ho  and  4l,  provide  a  valuable  comparison 
between  the  Cooper  and  Cornell  scales  whicn  has  heretofore  not  been 
available . 
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is.  abstract 

Although  rating  scales  of  varied  forms  have  been  widely  used  to  estimate  and 
evaluate  handling  qualities  over  the  past  decade,  a  number  of  deficiencies  in  both 
method  and  data  base  have  been  apparent.  This  investigation  was  aimed  at  overcoming 
many  of  these  deficiencies  by  attempting  to  resolve  the  difficulties  experienced  with 
rating  scales  themselves,  and  by  extending  and  adding  to  already  existing  relation¬ 
ships  between  ratings  and  pilot/vehlcle  system  parameters. 

Rating  scales  have  come  under  increasing  criticism  for  problems  such  as  wording 
ambiguity,  the  dual  mission  character  of  some  scales,  the  nonuniformity  in  the 
distribution  of  descriptors  across  the  scale,  and  the  misuse  of  scales  which  has 
occurred  when  ratings  have  been  averaged.  Psychometric  methods  provide  an  approach 
to  these  problems,  and  in  this  study  were  used  to  scale  several  phrases  descriptive 
of  vehicle  handling  qualities.  Thus,  quantitative  characteristics  were  derived  for 
contemporary  scales  through  the  use  of  the  Method  of  Successive  Intervals. 

An  experiment  was  conducted  which  added  to  available  data  relating  Cooper  ratings 
and  pilot/vehlcle  parameters,  and  which  also  tested  some  potential  alternate  scale 
candidates.  The  correlation  results  indicate  that  ratings  are  probably  based  on 
performance  and  the  degree  of  difficulty  experienced  in  maintaining  the  performance. 
The  difficulty  is  most  easily  represented  by  the  pilot  equalization  required  and  the 
vehicle  stick  characteristics. 
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